Encoder, decoder, encoding method, decoding method, and recording medium

ABSTRACT

An encoder includes a generation unit configured to generate first image data constituted of a pixel group of a first color component and second image data constituted of a pixel group of a second color component differing from the first color component, from RAW image data in which the first color component and the second color component are arranged in a repeating fashion; and an encoding unit configured to encode the second image data generated by the generation unit on the basis of the first image data generated by wherein the generation unit.

CLAIM OF PRIORITY

The present application claims priority from Japanese patent applicationJP 2018-005211 filed on Jan. 16, 2018, the content of which is herebyincorporated by reference into this application.

BACKGROUND

The present invention relates to an encoder, a decoder, an encodingmethod, a decoding method, an encoding program, and a decoding program.

There are techniques for compressing an image to each color component(see, for example JP 2002-125241 A). However, in this conventionaltechnique, correlation between component frames is not used.

SUMMARY

An aspect of the disclosure of an encoder in this application is anencoder, comprising: a generation unit configured to generate firstimage data constituted of a pixel group of a first color component andsecond image data constituted of a pixel group of a second colorcomponent differing from the first color component, from RAW image datain which the first color component and the second color component arearranged in a repeating fashion; and an encoding unit configured toencode the second image data generated by the generation unit on thebasis of the first image data generated by wherein the generation unit.

Another aspect of the disclosure of an encoder in this application is anencoder, comprising: a generation unit configured to generate firstimage data constituted of a first color component and second image dataconstituted a second color component differing from the first colorcomponent from RAW image data based on output from an image captureelement in which a photoelectric conversion unit is configured toperform photoelectric conversion of light of the first color componentand a photoelectric conversion unit is configured to performphotoelectric conversion of light of the second color component arearranged in a repeating fashion; and an encoding unit configured toencode the second image data generated by the generation unit on thebasis of the first image data generated by wherein the generation unit.

An aspect of the disclosure of a decoder in this application is adecoder, comprising: an acquisition unit configured to acquire firstencoded image data in which first image data constituted of a pixelgroup of a first color component is encoded, and second encoded imagedata in which second image data constituted of a pixel group of a secondcolor component differing from the first color component is encoded onthe basis of the first image data; a decoding unit configured to decodethe first encoded image data acquired by the acquisition unit to thefirst image data and decode the second encoded image data acquired bythe acquisition unit to the second image data on the basis of the firstimage data; and a generation unit configured to generate RAW image datain which the first color component and the second color component arearranged in a repeating fashion, on the basis of the first image dataand the second image data decoded by the decoding unit.

An aspect of the disclosure of an encoding method in this application isan encoding method, comprising: a generation unit configured to generatefirst image data constituted of a first color component and second imagedata constituted a second color component differing from the first colorcomponent from RAW image data based on output from an image captureelement in which a photoelectric conversion unit is configured toperform photoelectric conversion of light of the first color componentand a photoelectric conversion unit is configured to performphotoelectric conversion of light of the second color component arearranged in a repeating fashion; and

an encoding unit configured to encode the second image data generated bythe generation unit on the basis of the first image data generated bywherein the generation unit.

An aspect of the disclosure of a decoding method in this application isa decoding method, comprising: an acquisition unit configured to acquirefirst encoded image data in which first image data constituted of apixel group of a first color component is encoded, and second encodedimage data in which second image data constituted of a pixel group of asecond color component differing from the first color component isencoded on the basis of the first image data; a decoding unit configuredto decode the first encoded image data acquired by the acquisition unitto the first image data and decode the second encoded image dataacquired by the acquisition unit to the second image data on the basisof the first image data; and

a generation unit configured to generate RAW image data in which thefirst color component and the second color component are arranged in arepeating fashion, on the basis of the first image data and the secondimage data decoded by the decoding unit.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a descriptive drawing showing an encoding and decoding exampleof Embodiment 1.

FIG. 2 is a descriptive drawing showing an example of the color arrayshown in FIG. 1.

FIG. 3 is a block diagram showing a hardware configuration example ofthe information processing apparatus.

FIG. 4 is a block diagram showing a mechanical configuration example ofthe encoder according to Embodiment 1.

FIG. 5 is a descriptive drawing showing a generation example forcomponent frames by the first generation unit.

FIG. 6 is a block diagram showing a configuration example of theencoding unit.

FIG. 7 is a descriptive drawing showing a reference direction examplefor component frames.

FIG. 8 is a descriptive view showing a detection example for a motionvector.

FIG. 9 is a descriptive view showing an example 1 of pixel positioncompensation prediction between component frames.

FIG. 10 is a descriptive view showing an example 2 of pixel positioncompensation prediction between component frames.

FIG. 11 is a descriptive view showing an example 3 of pixel positioncompensation prediction between component frames

FIG. 12 is a descriptive view showing an example 4 of pixel positioncompensation prediction between component frames.

FIG. 13 is a descriptive view showing an example 5 of pixel positioncompensation prediction between component frames.

FIG. 14 is a descriptive view showing an example 6 of pixel positioncompensation prediction between component frames.

FIG. 15 is a descriptive view showing an example 7 of pixel positioncompensation prediction between component frames.

FIG. 16 is a descriptive view showing an example 4 of pixel positioncompensation prediction between component frames.

FIG. 17 is a descriptive drawing showing a data structure example for anencoded component frame.

FIG. 18 is a flowchart showing an example of encoding process steps bythe encoder.

FIG. 19 is a block diagram showing a mechanical configuration example ofthe decoder.

FIG. 20 is a block diagram showing a configuration example of thedecoding unit.

FIG. 21 is a flowchart showing an example of decoding process steps bythe decoder.

FIG. 22 is a descriptive drawing showing an encoding and decodingexample of Embodiment 2.

FIG. 23 is a block diagram showing a mechanical configuration example ofthe encoder according to Embodiment 2.

FIG. 24 is a flowchart showing an example of encoding process steps bythe encoder according to Embodiment 2.

FIG. 25 is a block diagram showing a mechanical configuration example ofthe decoder according to Embodiment 2.

FIG. 26 is a flowchart showing an example of decoding process steps bythe decoder according to Embodiment 2.

FIG. 27 is a descriptive drawing showing an encoding and decodingexample of Embodiment 3.

FIG. 28 is a descriptive drawing showing a reference direction examplefor component frames.

FIG. 29 is a descriptive drawing showing an example of encoding at theslice level.

FIG. 30 is a descriptive drawing showing an encoding and decodingexample of Embodiment 4.

DETAILED DESCRIPTION OF THE EMBODIMENT

Embodiments will be explained below determined with reference to theattached drawings. The image data to be encoded in the embodiments isRAW image data. RAW image data is, for example, image data prior tobeing subjected to a process of undergoing photoelectric conversion inan image capture element equipped with color filters in a Bayerarrangement and then being outputted, and is subjected to colorinterpolation and compression (in the case of a still image, JPEG (JointPhotographic Experts Group) format, and in the case of a video, MPEG(Moving Picture Experts Group) format).

However, the RAW image data may have undergone white balance adjustment.

Embodiment 1

<Encoding and Decoding Example>

FIG. 1 is a descriptive drawing showing an encoding and decoding exampleof Embodiment 1. (A) Separation and (B) encoding are executed by theencoder, and (C) decoding and (D) combining are executed by the decoder.The RAW image data 100 is image data in which a color array 101 having aplurality of color components is arranged periodically. In the case of aBayer arrangement, for example, the color array 101 includes, in anarrangement of 2×2 pixels, color components arranged such that green(G1) is in the upper left, blue (B) is in the upper right, red (R) is inthe lower left, and green (G2) is in the lower right. Other examples ofthe color array 101 will be described later with reference to FIG. 2.

(A) The encoder generates a component frame for each color componentfrom the RAW image data 100. Specifically, for example, the encodergenerates G1 image data 111 that is a color component frame for green(G1), G2 image data 112 that is a color component frame for green (G2),B image data 113 that is a color component frame for blue (B), and Rimage data 114 that is a color component frame for red (R).

The G1 image data 111 is image data constituted of a G1 pixel group fromthe color arrays 101 in the RAW image data 100. The G2 image data 112 isimage data constituted of a G2 pixel group from the color arrays 101 inthe RAW image data 100. The B image data 113 is image data constitutedof a B pixel group from the color arrays 101 in the RAW image data 100.The R image data 114 is image data constituted of an R pixel group fromthe color arrays 101 in the RAW image data 100.

(B) The encoder encodes the color component frames between the componentframes. Specifically, for example, the encoder encodes one componentframe group by in-frame prediction encoding to generate an I-picture,and encodes the remaining component frame groups by inter-frameprediction encoding to a P-picture or a B-picture. Here, the G1 imagedata 111 is encoded to G1 encoded image data 121, the G2 image data 112is encoded to G2 encoded image data 122, the B image data 113 is encodedto B encoded image data 123, and the R image data 114 is encoded to Rencoded image data 124.

(C) The decoder decodes the encoded component frame group. Specifically,for example, the decoder decodes the I-picture, and then, sequentiallydecodes the following P-picture or B-picture to generate anothercomponent frame. In other words, the decoder decodes the G1 encodedimage data 121, the G2 encoded image data 122, the B encoded image data123, and the R encoded image data 124, to generate the G1 image data111, the G2 image data 112, the B image data 113, and the R image data114.

(D) The decoder combines the component frames in the decoded componentframe group to generate the RAW image data 100. Specifically, forexample, pixels G1, G2, B, and R in the same position in the G1 imagedata 111, the G2 image data 112, the B image data 113, and the R imagedata 114 are arranged according to the color array 101 to decode the RAWimage data 100.

Thus, by performing inter-component-frame prediction of the RAW imagedata 100 by relying on the property that the hue and the chroma resultin a higher degree of correlation among component frames, it is possibleto improve encoding efficiency for the RAW image data 100. Also, it ispossible to restore the original RAW image data 100 even if encoding isperformed by inter-component-frame prediction encoding.

FIG. 2 is a descriptive drawing showing an example of the color arrayshown in FIG. 1. (a) indicates the same color array as the color array101 shown in FIG. 1. (b) shows a color array 201 in which the B pixeland the R pixel are interchanged in position as compared to (a). (c)shows a color array 202 in which a pixel array (G1, R) in the left halfand a pixel array (B, G2) in the right half of (a) are interchanged inposition. (d) shows a color array 203 in which a pixel array (G1, B) inthe left half and a pixel array (R, G2) in the right half of (b) areinterchanged in position. (e) shows an example of a color array 204 of6×6 pixels. The color array 204 of (e) includes green pixels in anypixel array including a vertical pixel array (6 pixels), a horizontalpixel array (6 pixels) and a diagonal pixel array (3 or more pixels).Below, the color array 101 of (a) is used as an example of the colorarray.

<Hardware Configuration Example of Information Processing Apparatus>

FIG. 3 is a block diagram showing a hardware configuration example ofthe information processing apparatus. An information processingapparatus 300 is an apparatus including an encoder and/or a decoder. Theinformation processing apparatus 300 may be an imaging apparatus such asa digital camera or a digital video camera, or a personal computer, atablet, a smartphone, or a gaming device, for example.

The information processing apparatus 300 includes a processor 301, astorage device 302, an operation device 303, an LSI (Large ScaleIntegration) 304, an imaging unit 305, and a communication interface(communication IF) 306. These are connected to one another by a bus 308.The processor 301 controls the information processing apparatus 300. Thestorage device 302 serves as a work area of the processor 301.

The storage device 302 is a non-transitory or temporary recording mediumwhich stores the various programs and data. The storage device 302 canbe, for example, a read-only memory (ROM), a random access memory (RAM),a hard disk drive (HDD), or a flash memory. The operation device 303operates data. The operation device 303 can be, for example, a button, aswitch, or a touch panel.

An LSI 304 is an integrated circuit that executes specific processesincluding image processes such as color interpolation, contourenhancement, and gamma correction; an encoding process; a decodingprocess; a compression/decompression process; and the like.

An imaging unit 305 captures a subject and generates RAW image data. Theimaging unit 305 has an imaging optical system 351, an image captureelement 353 having a color filter 352, and a signal processing circuit354.

The imaging optical system 351 is constituted of a plurality of lensesincluding a zoom lens and a focus lens, for example. For a simplifiedview, in FIG. 3, one lens is depicted for the imaging optical system351.

The image capture element 353 is a device for capturing an image of asubject using light beams passing through the imaging optical system351.

The image capture element 353 may be a sequential scanning typesolid-state image sensor (such as a CCD (charge-coupled device) imagesensor), or may be an X-Y addressing type solid-state image captureelement (such as a CMOS (complementary metal-oxide semiconductor) imagesensor).

On the light-receiving surface of the image capture element 353, a pixelgroup having photoelectric conversion units is arranged in a matrix. Foreach pixel of the image capture element 353, a plurality of types ofcolor filters 352 that respectively allow through light of differingcolor components are arranged in a predetermined color array. Thus, eachpixel of the image capture element 353 outputs an electrical signalcorresponding to each color component as a result of color separation bythe color filter 352.

In Embodiment 1, for example, red (R), green (G), and blue (B) colorfilters 152 are arranged periodically on the light-receiving surfaceaccording to a Bayer arrangement of two rows by two columns. As anexample, odd-numbered rows of the color array of the image captureelement 353 have G and B pixels arranged alternately, whereaseven-numbered rows of the color array have R and G pixels arrangedalternately. The color array overall has green pixels arranged so as toform a checkered pattern. As a result, the image capture element 353 canacquire RAW image data in color during imaging.

The signal processing circuit 354 sequentially executes, on an imagesignal inputted from the image capture element 353, an analog signalprocess (correlated double sampling, black level correction, etc.), anA/D conversion process, and digital signal processing (defective pixelcorrection). The RAW image data 100 outputted from the signal processingcircuit 354 is inputted to the LSI 304 or a storage device 302. Acommunication I/F 306 connects to an external device via the network andtransmits/receives data.

<Mechanical Configuration Example of Encoder>

FIG. 4 is a block diagram showing a mechanical configuration example ofthe encoder according to Embodiment 1. The encoder 400 has a firstgeneration unit 401, an encoding unit 402, and a recording unit 403. Thefirst generation unit 401 generates a component frame for each colorcomponent from the RAW image data 100, the encoding unit 402 encodes thecomponent frames, and the recording unit 403 records the encodedcomponent frames in the storage device 302.

The first generation unit 401, the encoding unit 402, and the recordingunit 403 are specifically functions realized by the LSI 304, or by theprocessor 301 executing programs stored in the storage device 302, forexample.

The first generation unit 401 generates first image data constituted ofa pixel group of a first color component and second image dataconstituted of a pixel group of a second color component differing fromthe first color component from RAW image data in which the first colorcomponent and the second color component are arranged in a repeatingfashion. In other words, the first generation unit 401 generates thefirst image data constituted of the first color component and the secondimage data constituted of the second color component differing from thefirst color component from RAW image data based on the output from animage capture element 353 in which pixels having a photoelectricconversion unit that performs photoelectric conversion of light of thefirst color component and pixels having a photoelectric conversion unitthat performs photoelectric conversion of the second color component arearranged in a repeating fashion. The RAW image data 100 may be imagedata directly outputted from the image capture element 353, or may be aduplicated RAW image data 100, for example.

As described above, the color array 101 has four color components ofthree types: green (G1), green (G2), blue (B), and red (R).

Here, the first color component is any one of the green (G1), green(G2), blue (B), and red (R) constituting the color array 101. The secondcolor component is blue (B) or red (R) if the first color component isgreen (G1) or green (G2); any one of green (G1), green (G2), and red (R)if the first color component is blue (B); and any one of green (G1),green (G2), and blue (B) if the first color component is red (R).

The first image data is the G1 image data 111 if the first colorcomponent is green (G1), the G2 image data 112 if the first colorcomponent is green (G2), the B image data 113 if the first colorcomponent is blue (B), and the R image data 114 if the first colorcomponent is red (R).

By contrast, the second image data is the B image data 113 or the Rimage data 114 if the first image data is the G1 image data 111 or theG2 image data 112; any one of the G1 image data 111, the G2 image data112, and the R image data 114 if the first image data is the B imagedata 113; and any one of the G1 image data 111, the G2 image data 112,and the B image data 113 if the first image data is the R image data114.

Also, the RAW image data 100 has a third color component that is thesame as either the first color component or the second color component,or differs from both the first color component and the second colorcomponent. Specifically, for example, if the first color component isgreen (G1) and the second color component is blue (B), then the thirdcolor component is the green (G2), which is the same color component asthe green (G1) of the first color component, or is red (R), which is acolor component differing from the first color component and the secondcolor component. Also, if the first color component is green (G2) andthe second color component is blue (B), the third color component isgreen (G1), which is the same color component as the green (G2) of thefirst color component, or is red (R), which is a color componentdiffering from the first color component and the second color component.

Similarly, if the first color component is green (G1) and the secondcolor component is red (R), then the third color component is the green(G2), which is the same color component as the green (G1) of the firstcolor component, or is blue (B), which is a color component differingfrom the first color component and the second color component. Also, ifthe first color component is green (G2) and the second color componentis red (R), then the third color component is the green (G1), which isthe same color component as the green (G2) of the first color component,or is blue (B), which is a color component differing from the firstcolor component and the second color component.

The third image data is the G2 image data 112 or the R image data 114 ifthe first image data is the G1 image data 111 and the second image datais the B image data 113. Also, the third image data is the G1 image data111 or the R image data 114 if the first image data is the G2 image data112 and the second image data is the B image data 113.

The third image data is the G2 image data 112 or the B image data 113 ifthe first image data is the G1 image data 111 and the second image datais the R image data 114. Also, the third image data is the G1 image data111 or the B image data 113 if the first image data is the G2 image data112 and the second image data is the R image data 114.

A fourth color component is a remaining color component, and the firstgeneration unit 401 generates fourth image data of the remaining colorcomponent from a pixel group of the remaining color component from amongthe RAW image data 100.

FIG. 5 is a descriptive drawing showing a generation example forcomponent frames by the first generation unit 401. (a) is the RAW imagedata 100 to be encoded. The horizontal direction pixel count of the RAWimage data 100 is H (in FIG. 5, H=8), and the vertical direction pixelcount is V (in FIG. 5, V=8).

(b) to (e) show component frames generated by rearranging the colorcomponents of the RAW image data 100. The component frame of (b) is theG1 image data 111, the component frame of (c) is the G2 image data 112,the component frame of (d) is the B image data 113, and the componentframe of (e) is the R image data 114.

The first generation unit 401 executes the separation of (A) in FIG. 1,thereby rearranging the G1 pixels, the B pixels, the R pixels, and theG2 pixels separated from the RAW image data 100 according to thepositions of the color array 101. As a result, the first generation unit401 generates four component frames including the G1 image data 111, theG2 image data 112, the B image data 113, and the R image data 114 fromthe one piece of RAW image data. The G1 image data 111, the G2 imagedata 112, the B image data 113, and the R image data 114 respectivelyare ¼ the image size (V/2×H/2) of the RAW image data 100.

Returning to FIG. 4, the encoding unit 402 encodes the second image datagenerated by the first generation unit 401 on the basis of the firstimage data generated by the first generation unit 401. Specifically, forexample, the encoding unit 402 compensates the pixel positions betweenthe first image data and the second image data to encode the secondimage data. Here, the “compensation of the pixel positions” refers tocompensating a focus pixel in the second image data with a specificreference pixel in the first image data at a position differing from thefocus pixel.

The G1 pixel, the G2 pixel, the B pixel, and the R pixel extracted fromthe same color array 101 of the RAW image data 100 are arranged at thesame pixel positions in the respective component frames (G1 image data111, G2 image data 112, B image data 113, R image data 114). However,among component frames, offsetting of the image due to differences inpixel positions in the color array 101 occurs. Thus, the encoding unit402 executes pixel position compensation among component framesgenerated from the same RAW image data 100, similar to the motioncompensation among frames in the time axis direction for a normalencoding process.

Here, known examples of the encoding method for pixel positioncompensation prediction include AVC (Advanced Video Coding) as definedin ISO/IEC 14496-10. The encoding unit 402 executes in-frame predictionencoding for a specific component frame (such as G1 image data 111) togenerate an I-picture, and executes inter-frame prediction encoding forthe remaining component frames (such as G2 image data 112, B image data113, R image data 114) to generate a P-picture or a B-picture.

The I-picture is encoded image data attained by encoding completedwithin a component frame. The P-picture is encoded image data attainedby performing inter-component-frame prediction encoding for a maximum ofone reference component frame. The B-picture is encoded image dataattained by performing inter-component-frame prediction encoding for amaximum of two reference component frames. A detailed configurationexample of the encoding unit 402 will be explained below.

<Configuration Example of Encoding Unit 402>

FIG. 6 is a block diagram showing a configuration example of theencoding unit 402. The encoding unit 402 has a first accumulation unit601, a subtraction unit 602, an orthogonal transformation unit 603, aquantization unit 604, a variable-length coding unit 605, an inversequantization unit 606, an inverse orthogonal transformation unit 607, anaddition unit 608, a second accumulation unit 609, a position offsetdetection unit 610, and a first pixel position compensation unit 611.

The first accumulation unit 601 accumulates the component framesoutputted from the first generation unit 401 (G1 image data 111, G2image data 112, B image data 113, R image data 114). The componentframes accumulated in the first accumulation unit 601 are outputted tothe subtraction unit 602 as image data to be encoded in the order thatthe component frames were inputted. The image data that has been encodedis sequentially deleted from the first accumulation unit 601.

When generating the P-picture or the B-picture, the subtraction unit 602outputs a difference signal (prediction error value) between a componentframe of the inputted original image and a prediction value generated bythe first pixel position compensation unit 611 to be described later.Also, when generating the I-picture, the subtraction unit 602 outputsthe component frame of the inputted original image as is.

When generating the I-picture, the orthogonal transformation unit 603performs orthogonal transformation on the component frames of theoriginal image inputted after passing through the subtraction unit 602without modification. Also, when generating the P-picture or theB-picture, the orthogonal transformation unit 603 performs orthogonaltransformation on the above-mentioned difference signal.

The quantization unit 604 converts the frequency coefficient (orthogonaltransformation coefficient) for each block inputted from the orthogonaltransformation unit 603 into a quantization coefficient. The output fromthe quantization unit 604 is inputted to the variable-length coding unit605 and the inverse quantization unit 606.

The variable-length coding unit 605 performs variable-length coding of amotion vector for positional offset from the position offset detectionunit 610 (hereinafter, simply referred to as the “motion vector”) andoutputs the encoded gradation component frames (I-picture, P-picture,B-picture).

The inverse quantization unit 606 performs inverse quantization on aquantized coefficient at the block level, which is the level at whichencoding is performed, to decode the frequency coefficient. The inverseorthogonal transformation unit 607 performs inverse orthogonaltransformation on the frequency coefficient decoded by the inversequantization unit 606 to decode the prediction error value (or componentframes of original image).

The addition unit 608 adds the decoded prediction error value to aprediction value (to be mentioned later) generated by the first pixelposition compensation unit 611. Decoded values (reference componentframes) of the picture outputted from the addition unit 608 areaccumulated in the second accumulation unit 609. Component frames notreferred to in pixel position compensation prediction thereafter aresequentially deleted from the second accumulation unit 609.

The position offset detection unit 610 uses a reference image from thesecond accumulation unit 609 to detect the motion vector indicating theoffset in pixel position for predicting the component frames to beencoded. The motion vector is outputted to the first pixel positioncompensation unit 611 and the variable-length coding unit 605.

The first pixel position compensation unit 611 outputs the predictionvalues predicted at the block level for the component frames to beencoded on the basis of the motion vector and the reference componentframe. The prediction values are outputted to the subtraction unit 602and the addition unit 608.

If pixel position compensation prediction is to be performed for a givenblock, when the component frames to be encoded completely match theprediction values, only the motion vector is encoded. If the componentframes to be encoded partially match the prediction values, the motionvector and a difference image are encoded. If none of the componentframes to be encoded matches the prediction values, the image for theentire block is encoded.

<Reference Direction Example for Component Frames>

FIG. 7 is a descriptive drawing showing a reference direction examplefor component frames. (A) shows a reference direction for a case inwhich the component frames from the same RAW image data 100 are inputtedin the order of the G1 image data 111, the G2 image data 112, the Bimage data 113, and the R image data 114. The G1 image data 111, whichis the first component frame, is encoded to an I-picture. Thesubsequently inputted G2 image data 112 is encoded into a P-picture byinter-frame prediction encoding with the preceding G1 image data 111 asthe reference component frame.

The subsequently inputted B image data 113 is encoded into a P-pictureor a B-picture by inter-frame prediction encoding with the componentframe of at least one of the preceding G1 image data 111 and G2 imagedata 112 in each block as the reference component frame. The R imagedata 114 inputted last is encoded into a P-picture or a B-picture byinter-frame prediction encoding with the component frame of at least oneof the preceding G1 image data 111, G2 image data 112, and B image data113 in each block as the reference component frame.

(B) shows a reference direction for a case in which the component framesfrom the same RAW image data 100 are inputted in the order of the Bimage data 113, the R image data 114, the G1 image data 111, and the G2image data 112. The B image data 113, which is the first componentframe, is encoded to an I-picture. The subsequently inputted R imagedata 114 is encoded into a P-picture by inter-frame prediction encodingwith the preceding B image data 113 as the reference component frame.

The subsequently inputted G1 image data 111 is encoded into a P-pictureor a B-picture by inter-frame prediction encoding with the componentframe of at least one of the preceding B image data 113 and R image data114 as the reference component frame. The G2 image data 112 inputtedlast is encoded into a P-picture or a B-picture by inter-frameprediction encoding with the component frame of at least one of thepreceding B image data 113, R image data 114, and G1 image data 111 asthe reference component frame.

The reference directions shown in FIG. 7 are merely examples, andencoding is possible in an input order for component frames other than(A) and (B). In other words, the first component frame is encoded to anI-picture and the subsequent component frames are encoded to a P-pictureor a B-picture. Also, this encoding unit 402 uses the luminance valuefrom pixels in the image capture element 353 that do not depend on thecolor components, and thus, can perform encoding even when differingcolor components are used as the reference frame.

<Motion Vector Detection Example>

FIG. 8 is a descriptive view showing a detection example for a motionvector. (A) shows the RAW image data 100 and the component frames, and(B) to (M) show a detection example for motion vectors. In FIG. 8, inorder to simplify the explanation, as shown in (A), the RAW image data100 is a frame where H=4 pixels and V=4 pixels. Also, in order todistinguish from another instance of the same color component, “a” to“d” and “x” are suffixed onto the reference character of the colorcomponent.

Additionally, the positional offset of the reference pixels of thereference component frame in relation to the position of the focus pixelof the component frame to be predicted is set to the motion vectorV(x,y). In the motion vector V(x,y), x increases by an offset to theright, x decreases by an offset to the left, y increases by a downwardoffset, and y decreases by an upward offset. In FIG. 8, the motionvector V(x,y) is indicated by a black arrow.

(B) to (E) show examples for detecting the motion vector V(Rx) when thereference component frame is the B image data 113, the component frameto be predicted is the R image data 114, and the focus pixel is a pixelRx in the R image data 114. In (B) to (E), the position offset detectionunit 610 detects the motion vector for predicting the focus pixel usingone reference pixel. As a result, the R image data 114 can be encodedinto a P-picture.

In (B), the reference pixel of the B image data 113 is a pixel Bb. Thefocus pixel Rx is at the same pixel position as the reference pixel Bb.In other words, there is no positional offset between the referencepixel Bb and the focus pixel Rx. Therefore, when predicting the focuspixel Rx using the pixel Bb, the motion vector V(B)=(0,0). That is, themotion vector (B) is not detected.

In (C), the reference pixel of the B image data 113 is a pixel Ba. Thefocus pixel Rx is at a pixel position offset from the reference pixel Baby one pixel to the right. In other words, there is a positional offsetbetween the reference pixel Ba and the focus pixel Rx. Therefore, whenpredicting the focus pixel Rx using the reference pixel Ba, the motionvector V(B)=(−1,0) is detected.

In (D), the reference pixel of the B image data 113 is a pixel Bd. Thefocus pixel Rx is at a pixel position offset from the reference pixel Bdby one pixel upward. In other words, there is a positional offsetbetween the reference pixel Bd and the focus pixel Rx. Therefore, whenpredicting the focus pixel Rx using the reference pixel Bd, the motionvector V(B)=(0,1) is detected.

In (E), the reference pixel of the B image data 113 is a pixel Bc. Thefocus pixel Rx is at a pixel position offset from the reference pixel Bcby one pixel to the right and one pixel upward. In other words, there isa positional offset between the reference pixel Bc and the focus pixelRx. Therefore, when predicting the focus pixel Rx using the referencepixel Bc, the motion vector V(B)=(−1,1) is detected.

(F) to (J) show examples for detecting the motion vector V(B) when thereference component frame is the B image data 113, the component frameto be predicted is the R image data 114, and the focus pixel is a pixelRx in the R image data 114. In (F) to (J), the position offset detectionunit 610 detects the motion vector for predicting the focus pixel usinga plurality of reference pixels of the same color component. As aresult, the R image data 114 can be encoded into a P-picture.

In (F), the reference pixels of the B image data 113 are the pixels Bato Bd. The focus pixel Rx is at the same pixel position as the referencepixel Bb. In predicting the reference pixels Ba to Bd, the averagereference pixel position is the center of the reference pixels Ba to Bd,and there is a positional offset from the focus pixel Rx.

In other words, the focus pixel Rx is at a pixel position offset fromthe center of the reference pixels Ba to Bd by 0.5 pixels to the rightand 0.5 pixels upward. Therefore, when predicting the focus pixel Rxusing the reference pixels Ba to Bd, the motion vector V(B)=(−0.5,0.5)is detected.

In (G), the reference pixels of the B image data 113 are the pixels Bband Bd. The focus pixel Rx is at the same pixel position as thereference pixel Bb. In predicting the reference pixels Bb and Bd, theaverage reference pixel position is the center of the reference pixelsBb and Bd, and there is a positional offset from the focus pixel Rx.

In other words, the focus pixel Rx is at a pixel position offset fromthe center of the reference pixels Bb and Bd by 0.5 pixels upward.Therefore, when predicting the focus pixel Rx using the reference pixelsBb and Bd, the motion vector V(B)=(0,0.5) is detected.

In (H), the reference pixels of the B image data 113 are the pixels Baand Bc. The focus pixel Rx is at the same pixel position as thereference pixel Bb. In predicting the reference pixels Ba and Bc, theaverage reference pixel position is the center of the reference pixelsBa and Bc, and there is a positional offset from the focus pixel Rx.

In other words, the focus pixel Rx is at a pixel position offset fromthe center of the reference pixels Ba and Bc by 1 pixel to the right and0.5 pixels upward. Therefore, when predicting the focus pixel Rx usingthe reference pixels Ba and Bc, the motion vector V(B)=(−1,0.5) isdetected.

In (I), the reference pixels of the B image data 113 are the pixels Baand Bb. The focus pixel Rx is at the same pixel position as thereference pixel Bb. In predicting the reference pixels Ba and Bb, theaverage reference pixel position is the center of the reference pixelsBa and Bb, and there is a positional offset from the focus pixel Rx.

In other words, the focus pixel Rx is at a pixel position offset fromthe center of the reference pixels Ba and Bb by 0.5 pixels to the right.Therefore, when predicting the focus pixel Rx using the reference pixelsBa and Bb, the motion vector V(B)=(−0.5,0) is detected.

In (J), the reference pixels of the B image data 113 are the pixels Bcand Bd. The focus pixel Rx is at the same pixel position as thereference pixel Bb. In predicting the reference pixels Bc and Bd, theaverage reference pixel position is the center of the reference pixelsBc and Bd, and there is a positional offset from the focus pixel Rx.

In other words, the focus pixel Rx is at a pixel position offset fromthe center of the reference pixels Bc and Bd by 0.5 pixels to the rightand 1 pixel upward. Therefore, when predicting the focus pixel Rx usingthe reference pixels Bc and Bd, the motion vector V(B)=(−0.5,1) isdetected.

(K) to (M) show examples for detecting the motion vector V when thereference component frame is the G1 image data 111 and/or the G2 imagedata 112, the component frame to be predicted is the R image data 114,and the focus pixel is a pixel Rx in the R image data 114. In (K) to(M), the position offset detection unit 610 detects the motion vectorfor predicting the focus pixel using a plurality of reference pixels ofthe same color component or differing color components. As a result, theR image data 114 can be encoded into a P-picture or B-picture.

In (K), the reference pixels of the G1 image data 111 are pixels G1 band G1 d. The focus pixel Rx is at the same pixel position as thereference pixel G1 b. In predicting the reference pixels G1 b and G1 d,the average reference pixel position is the center of the referencepixels G1 b and G1 d, and there is a positional offset from the focuspixel Rx.

In other words, the focus pixel Rx is at a pixel position offset fromthe center of the reference pixels G1 b and G1 d by 0.5 pixels upward.Therefore, when predicting the focus pixel Rx using the reference pixelsG1 b and G1 d, the motion vector V(G1)=(0,0.5) is detected.

In (L), the reference pixels of the G2 image data 112 are pixels G2 aand G2 b. The focus pixel Rx is at the same pixel position as thereference pixel G2 b. In predicting the reference pixels G2 a and G2 b,the average reference pixel position is the center of the referencepixels G2 a and G2 b, and there is a positional offset from the focuspixel Rx.

In other words, the focus pixel Rx is at a pixel position offset fromthe center of the reference pixels G2 a and G2 b by 0.5 pixels to theright. Therefore, when predicting the focus pixel Rx using the referencepixels G2 a and G2 b, the motion vector V(G2)=(−0.5,0) is detected.

In (M), the G1 image data 111 and the G2 image data 112 are referencecomponent frames. Thus, the motion vector of (M) is a resultant motionvector V(G) that is a combination of the motion vector V(G1) of (K) andthe motion vector V(G2) of (L). Therefore, when predicting the focuspixel Rx using the reference pixels G1 b and G1 d and the referencepixels G2 a and G2 b, the resultant motion vector V(G)=(−0.5,0.5) isdetected.

In FIG. 8, the R image data 114 was set to be predicted, but the G1image data 111, the G2 image data 112, and the B image data 113 may beset to be predicted instead. Also, the R image data 114 may be set to bethe reference component frame. Additionally, the reference componentframes of both directions in (M) may be differing color components asopposed to the same color component.

In this manner, the G1 pixel, the G2 pixel, the B pixel, and the R pixelextracted from the same color array 101 of the RAW image data 100 arearranged at the same pixel positions in the respective component frames.However, among component frames, offsetting of the image due todifferences in pixel positions in the color array 101 occurs. Thus,pixel position compensation prediction among component frames isperformed in consideration of the differing pixel positions in the colorarray 101.

<Example of Pixel Position Compensation Prediction Between ComponentFrames>

Below an example of pixel position compensation prediction betweencomponent frames will be described with reference to FIGS. 9 to 16. InFIGS. 9 to 16, the arrangement of pixels in the RAW image data 100 isindicated with circles. The range of each color array (sample point ofcomponent frames) is indicated with frame borders.

FIG. 9 is a descriptive view showing an example 1 of pixel positioncompensation prediction between component frames. The reference patternof (A) shows an example in which the component frame to be predicted isthe G2 image data 112, and the value of the focus pixel G2 x thereof ispredicted using the average of four surrounding pixels G1 a to G1 d ofthe G1 image data 111 that is the reference component frame adjacent tothe focus pixel G2 x.

In (A), the pixel G1 a and the pixel G2 x belong to the same samplepoint, but the focus pixel G2 x is affected by the pixels G1 b to G1 ddue to interpolation. Thus, the range of the reference pixel of the G1image data 111 that is the reference component frame is offset by 0.5pixels to the right and 0.5 pixels downward in relation to the positionof the focus pixel G2 x. Therefore, in this case, the motion vectorV(G1)=(0.5,0.5).

In this manner, in (A), by predicting the pixel value of the focus pixelG2 x by averaging the four adjacent pixels G1 a to G1 d, the G2 imagedata 112 is encoded to the P-picture. Therefore, the G1 image data canbe used to perform a prediction based on the pixel position of the focuspixel G2 x. Also, it is possible to mitigate encoding distortionincluded in the decoded value of the G1 image data 111, which is thereference component frame.

The reference pattern of (B) shows an example in which the componentframe to be predicted is the G2 image data 112, and the value of thefocus pixel G2 x thereof is predicted using the average of two adjacentpixels G1 a and G1 c of the G1 image data 111 that is the referencecomponent frame adjacent to the focus pixel G2 x. Such pixel positioncompensation prediction has a high probability of being selected in ablock of an image including a vertical edge. The pixel value of thefocus pixel G2 x is predicted according to the average of the twoadjacent pixels G1 a and G1 c, and thus, it is possible to mitigateencoding distortion included in the decoded value of the G1 image data111, which is the reference component frame.

In (B), the pixel G1 a and the pixel G2 x belong to the same samplepoint, but the focus pixel G2 x is affected by the pixels G1 a and G1 cdue to interpolation. Thus, the range of the reference pixel of the G1image data 111 that is the reference component frame is offset by 0.5pixels downward in relation to the position of the focus pixel G2 x.Therefore, in this case, the motion vector V(G1)=(0,0.5).

The reference pattern of (C) shows an example in which the componentframe to be predicted is the G2 image data 112, and the value of thefocus pixel G2 x thereof is predicted using the average of two adjacentpixels G1 b and G1 d of the G1 image data 111 that is the referencecomponent frame adjacent to the focus pixel G2 x. Such pixel positioncompensation prediction has a high probability of being selected in ablock of an image including a vertical edge. The pixel value of thefocus pixel G2 x is predicted according to the average of the twoadjacent pixels G1 b and G1 d, and thus, it is possible to mitigateencoding distortion included in the decoded value of the G1 image data111, which is the reference component frame.

In (C), the pixel G1 a and the pixel G2 x belong to the same samplepoint, but the focus pixel G2 x is affected by the pixels G1 b and G1 ddue to interpolation. Thus, the range of the reference pixel of the G1image data 111 that is the reference component frame is offset by onepixel to the right and 0.5 pixels downward in relation to the positionof the focus pixel G2 x. Therefore, in this case, the motion vectorV(G1)=(1,0.5).

The reference pattern of (D) shows an example in which the componentframe to be predicted is the G2 image data 112, and the value of thefocus pixel G2 x thereof is predicted using the average of two adjacentpixels G1 a and G1 b of the G1 image data 111 that is the referencecomponent frame adjacent to the focus pixel G2 x. Such pixel positioncompensation prediction has a high probability of being selected in ablock of an image including a horizontal edge. The pixel value of thefocus pixel G2 x is predicted according to the average of the twoadjacent pixels G1 a and G1 b, and thus, it is possible to mitigateencoding distortion included in the decoded value of the G1 image data111, which is the reference component frame.

In (D), the pixel G1 a and the pixel G2 x belong to the same samplepoint, but the focus pixel G2 x is affected by the pixels G1 a and G1 bdue to interpolation. Thus, the range of the reference pixel of the G1image data 111 that is the reference component frame is offset by 0.5pixels to the right in relation to the position of the focus pixel G2 x.Therefore, in this case, the motion vector V(G1)=(0.5,0).

The reference pattern of (E) shows an example in which the componentframe to be predicted is the G2 image data 112, and the value of thefocus pixel G2 x thereof is predicted using the average of two adjacentpixels G1 c and G1 d of the G1 image data 111 that is the referencecomponent frame adjacent to the focus pixel G2 x. Such pixel positioncompensation prediction has a high probability of being selected in ablock of an image including a horizontal edge. The pixel value of thefocus pixel G2 x is predicted according to the average of the twoadjacent pixels G1 c and G1 d, and thus, it is possible to mitigateencoding distortion included in the decoded value of the G1 image data111, which is the reference component frame.

In (E), the pixel G1 a and the pixel G2 x belong to the same samplepoint, but the focus pixel G2 x is affected by the pixels G1 c and G1 ddue to interpolation. Thus, the range of the reference pixel of the G1image data 111 that is the reference component frame is offset by 0.5pixels to the right and one pixel downward in relation to the positionof the focus pixel G2 x. Therefore, in this case, the motion vectorV(G1)=(0.5,1).

In this manner, in (B) to (E), the pixel position compensationprediction has a high probability of being selected in a block of animage including a vertical edge or a horizontal edge. By predicting thepixel value of the focus pixel G2 x by averaging the two adjacentpixels, the G2 image data 112 is encoded to the P-picture. Thus, it ispossible to mitigate encoding distortion included in the decoded valueof the G1 image data 111, which is the reference component frame.

FIG. 10 is a descriptive view showing an example 2 of pixel positioncompensation prediction between component frames. In FIG. 10, the valueof the focus pixel of the component frame to be predicted is predictedaccording to the value of the pixel at a differing position in areference component frame that is the same color component as the colorcomponent to be predicted. In FIG. 10, the component frame to bepredicted is the G2 image data 112, the focus pixel thereof is the pixelG2 x, and the reference component frame is the G2 image data 111.

In the reference pattern of (A), the value of the focus pixel G2 x ofthe G2 image data 112 that is the component frame to be predicted ispredicted according to the value of the pixel G1 a positioned to theupper left in the RAW image data. In this case, the reference pixel(sample point belonging to G1 a) of the G1 image data 111 that is thereference component frame is in the same position as the focus pixel(sample point belonging to G2 x) in the G2 image data 112 that is thecomponent frame to be predicted. Therefore, the motion vectorV(G1)=(0,0). Such motion compensation prediction has a high probabilityof being selected in a block of an image including a diagonal edge fromthe lower right to the upper left.

In the reference pattern of (B), the value of the focus pixel G2 x ofthe G2 image data 112 that is the component frame to be predicted ispredicted according to the value of the pixel G1 b positioned to theupper right in the RAW image data. In this case, the reference pixel(sample point belonging to G1 b) of the G1 image data 111 that is thereference component frame is offset by one pixel to the right ascompared to the position of the focus pixel (sample point belonging toG2 x) in the G2 image data 112 that is the component frame to bepredicted. Therefore, the motion vector V(G1)=(1,0). Such motioncompensation prediction has a high probability of being selected in ablock of an image including a diagonal edge from the lower left to theupper right.

In the reference pattern of (C), the value of the focus pixel G2 x ofthe G2 image data 112 that is the component frame to be predicted ispredicted according to the value of the pixel G1 c positioned to thelower left in the RAW image data. In this case, the reference pixel(sample point belonging to G1 c) of the G1 image data 111 that is thereference component frame is offset by one pixel downward as compared tothe position of the focus pixel (sample point belonging to G2 x) in theG2 image data 112 that is the component frame to be predicted.Therefore, the motion vector V(G1)=(0,1). Such motion compensationprediction has a high probability of being selected in a block of animage including a diagonal edge from the lower right to the upper left.

In the reference pattern of (D), the value of the focus pixel G2 x ofthe G2 image data 112 that is the component frame to be predicted ispredicted according to the value of the pixel G1 d positioned to thelower right in the RAW image data 100. In this case, the reference pixel(sample point belonging to G1 d) of the G1 image data 111 that is thereference component frame is offset by one pixel to the right and onepixel downward as compared to the position of the focus pixel (samplepoint belonging to G2 x) in the G2 image data 112 that is the componentframe to be predicted. Therefore, the motion vector V(G1)=(1,1). Suchmotion compensation prediction has a high probability of being selectedin a block of an image including a diagonal edge from the lower left tothe upper right.

In this manner, by predicting the pixel value of the focus pixel G2 xusing one pixel G1, the G2 image data 112 is encoded to the P-picture.Thus, it is possible to mitigate encoding distortion included in thedecoded value of the G1 image data 111, which is the reference componentframe.

FIG. 11 is a descriptive view showing an example 3 of pixel positioncompensation prediction between component frames. In FIG. 11, the valueof the focus pixel of the component frame to be predicted is predictedaccording to the value of the pixel at a differing position in areference component frame that differs in color component from thecomponent frame to be predicted. In FIG. 11, the component frame to bepredicted is the B image data 113, the focus pixel thereof is the pixelBx, and the reference component frame is the G1 image data 111 or the G2image data 112.

The reference pattern of (A) shows an example in which the value of thefocus pixel Bx of the B image data 113 that is the component frame to bepredicted is predicted using the average of two adjacent pixels G1 a andG1 b of the G1 image data 111 that is the reference component frameadjacent to the focus pixel Bx. By predicting the pixel value of thefocus pixel Bx by averaging the two adjacent pixels G1 a and G1 b, the Bimage data 113 is encoded to the P-picture. Such pixel positioncompensation prediction has a high probability of being selected in ablock of an image including a horizontal edge. Also, it is possible tomitigate encoding distortion included in the decoded value of the G1image data 111, which is the reference component frame.

In (A), the pixel G1 a and the focus pixel Bx belong to the same samplepoint, but the focus pixel Bx is affected by the pixels G1 a and G1 bdue to interpolation. Thus, the range of the reference pixel of the G1image data 111 that is the reference component frame is offset by 0.5pixels to the right in relation to the position of the focus pixel Bx.Therefore, in this case, the motion vector V(G1)=(0.5,0).

The reference pattern of (B) shows an example in which the value of thefocus pixel Bx of the B image data 113 that is the component frame to bepredicted is predicted using the average of two adjacent pixels G2 a andG2 b of the G2 image data 111 that is the reference component frameadjacent to the focus pixel Bx. By predicting the pixel value of thefocus pixel Bx by averaging the two adjacent pixels G2 a and G2 b, the Bimage data 113 is encoded to the P-picture. Such pixel positioncompensation prediction has a high probability of being selected in ablock of an image including a vertical edge. Also, it is possible tomitigate encoding distortion included in the decoded value of the G2image data 112, which is the reference component frame.

In (B), the reference pixel G2 b and the focus pixel Bx belong to thesame sample point, but the focus pixel Bx is affected by the pixels G2 aand G2 b due to interpolation. Thus, the range of the reference pixel ofthe G2 image data 112 that is the reference component frame is offset by0.5 pixels upward in relation to the position of the focus pixel Bx.Therefore, in this case, the motion vector V(G2)=(0,−0.5).

The reference pattern of (C) is a combination of the pixel positioncompensation prediction of (A) and the pixel position compensationprediction of (B). In other words, (C) is an example in which the valueof the focus pixel Bx of the B image data 113 that is the componentframe to be predicted is predicted using the average of two adjacentpixels G1 a and G1 b of the G1 image data 111 and two adjacent pixels G2a and G2 b of the G2 image data 112, the G1 image data 111 and the G2image data 112 being the reference component frames adjacent to thefocus pixel Bx.

By predicting the pixel value of the focus pixel Bx by averaging thefour adjacent pixels G1 a, G1 b, G2 a, and G2 b, the B image data 113 isencoded to the B-picture. Thus, it is possible to further mitigateencoding distortion included in the decoded value of the G1 image data111 and the G2 image data 112, which are the reference component frames.

In (C), the pixels G1 a and G2 b and the focus pixel Bx belong to thesame sample point, but the focus pixel Bx is affected by the pixels G1a, G1 b, G2 a, and G2 b due to interpolation. Thus, the range of thereference pixel of the G1 image data 111 that is the reference componentframe is offset by 0.5 pixels to the right in relation to the positionof the focus pixel Bx, and the range of the reference pixel of the G2image data 112 is offset by 0.5 pixels upward in relation to theposition of the focus pixel Bx. Therefore, in this case, the motionvector V(G) is defined as V(G1)+V(G2)=(0.5,−0.5).

FIG. 12 is a descriptive view showing an example 4 of pixel positioncompensation prediction between component frames. In FIG. 12, the valueof the focus pixel of the component frame to be predicted is predictedaccording to the value of the pixel at a differing position in areference component frame that differs in color component from thecomponent frame to be predicted. In FIG. 12, the component frame to bepredicted is the R image data 114, the focus pixel thereof is the pixelRx, and the reference component frame is the G1 image data 111 or the G2image data 112.

The reference pattern of (A) shows an example in which the value of thefocus pixel Rx of the R image data 114 that is the component frame to bepredicted is predicted using the average of two adjacent pixels G1 a andG1 b of the G1 image data 111 that is the reference component frameadjacent to the focus pixel Rx. By predicting the pixel value of thefocus pixel Rx by averaging the two adjacent pixels G1 a and G1 b, the Rimage data 114 is encoded to the P-picture. Such pixel positioncompensation prediction has a high probability of being selected in ablock of an image including a vertical edge. Also, it is possible tomitigate encoding distortion included in the decoded value of the G1image data 111, which is the reference component frame.

In (A), the pixel G1 a and the focus pixel Rx belong to the same samplepoint, but the focus pixel Rx is affected by the pixels G1 a and G1 bdue to interpolation. Thus, the range of the reference pixel of the G1image data 111 that is the reference component frame is offset by 0.5pixels downward in relation to the position of the focus pixel Rx.Therefore, in this case, the motion vector V(G1)=(0,0.5).

The reference pattern of (B) shows an example in which the value of thefocus pixel Rx of the R image data 114 that is the component frame to bepredicted is predicted using the average of two adjacent pixels G2 a andG2 b of the G2 image data 112 that is the reference component frameadjacent to the focus pixel Rx. By predicting the pixel value of thefocus pixel Rx by averaging the two adjacent pixels G2 a and G2 b, the Rimage data 114 is encoded to the P-picture. Such pixel positioncompensation prediction has a high probability of being selected in ablock of an image including a horizontal edge. Also, it is possible tomitigate encoding distortion included in the decoded value of the G2image data 112, which is the reference component frame.

In (B), the reference pixel G2 b and the focus pixel Rx belong to thesame sample point, but the focus pixel Rx is affected by the pixels G2 aand G2 b due to interpolation. Thus, the range of the reference pixel ofthe G2 image data 112 that is the reference component frame is offset by0.5 pixels to the left in relation to the position of the focus pixelRx. Therefore, in this case, the motion vector V(G2)=(−0.5,0).

The reference pattern of (C) is a combination of the pixel positioncompensation prediction of (A) and the pixel position compensationprediction of (B). In other words, (C) is an example in which the valueof the focus pixel Rx of the R image data 114 that is the componentframe to be predicted is predicted using the average of two adjacentpixels G1 a and G1 b of the G1 image data 111 and two adjacent pixels G2a and G2 b of the G2 image data 112, the G1 image data 111 and the G2image data 112 being the reference component frames adjacent to thefocus pixel Rx.

By predicting the pixel value of the focus pixel Bx by averaging thefour adjacent pixels G1 a, G1 b, G2 a, and G2 b, the R image data 114 isencoded to the B-picture. Thus, it is possible to mitigate encodingdistortion included in the decoded value of the G1 image data 111 andthe G2 image data 112, which are the reference component frames.

In (C), the pixels G1 a and G2 b and the focus pixel Rx belong to thesame sample point, but the focus pixel Rx is affected by the pixels G1a, G1 b, G2 a, and G2 b due to interpolation. Thus, the range of thereference pixel of the G1 image data 111 that is the reference componentframe is offset by 0.5 pixels downward in relation to the position ofthe focus pixel Rx, and the range of the reference pixel of the G2 imagedata 112 is offset by 0.5 pixels to the left in relation to the positionof the focus pixel Rx. Therefore, in this case, the motion vector V(G)is defined as V(G1)+V(G2)=(−0.5,0.5).

FIG. 13 is a descriptive view showing an example 5 of pixel positioncompensation prediction between component frames. The reference patternof (A) shows an example in which the component frame to be predicted isthe R image data 114, and the value of the focus pixel Rx thereof ispredicted using the average of four surrounding pixels Ba to Bd of the Bimage data 113 that is the reference component frame adjacent to thefocus pixel Rx.

In (A), the pixel Bb and the pixel Rx belong to the same sample point,but the focus pixel Rx is affected by the pixels Bb to Bd due tointerpolation. Thus, the range of the reference pixel of the B imagedata 113 that is the reference component frame is offset by 0.5 pixelsto the left and 0.5 pixels downward in relation to the position of thefocus pixel Rx. Therefore, in this case, the motion vectorV(B)=(−0.5,0.5).

In this manner, by predicting the pixel value of the focus pixel Rx byaveraging the four adjacent pixels Ba to Bd, the R image data 114 isencoded to the P-picture. Therefore, the B image can be used to performa prediction based on the pixel position of the focus pixel Rx. Also, itis possible to mitigate encoding distortion included in the decodedvalue of the B image data 113, which is the reference component frame.

The reference pattern of (B) shows an example in which the componentframe to be predicted is the R image data 114, and the value of thefocus pixel Rx thereof is predicted using the average of two adjacentpixels Ba and Bc of the B image data 113 that is the reference componentframe adjacent to the focus pixel Rx. Such pixel position compensationprediction has a high probability of being selected in a block of animage including a vertical edge. The pixel value of the focus pixel Rxis predicted according to the average of the two adjacent pixels Ba andBc, and thus, it is possible to mitigate encoding distortion included inthe decoded value of the B image data 113, which is the referencecomponent frame.

In (B), the pixel Bb and the pixel Rx belong to the same sample point,but the focus pixel Rx is affected by the pixels Ba and Bc due tointerpolation. Thus, the range of the reference pixel of the B imagedata 113 that is the reference component frame is offset by one pixel tothe left and 0.5 pixels downward in relation to the position of thefocus pixel Rx. Therefore, in this case, the motion vectorV(B)=(−1,0.5).

The reference pattern of (C) shows an example in which the componentframe to be predicted is the R image data 114, and the value of thefocus pixel Rx thereof is predicted using the average of two adjacentpixels Bb and Bd of the B image data 113 that is the reference componentframe adjacent to the focus pixel Rx. Such pixel position compensationprediction has a high probability of being selected in a block of animage including a vertical edge. The pixel value of the focus pixel Rxis predicted according to the average of the two adjacent pixels Bb andBd, and thus, it is possible to mitigate encoding distortion included inthe decoded value of the B image data 113, which is the referencecomponent frame.

In (C), the pixel Bb and the pixel Rx belong to the same sample point,but the focus pixel Rx is affected by the pixels Bb and Bd due tointerpolation. Thus, the range of the reference pixel of the B imagedata 113 that is the reference component frame is offset by 0.5 pixelsdownward in relation to the position of the focus pixel Rx. Therefore,in this case, the motion vector V(B)=(0,0.5).

The reference pattern of (D) shows an example in which the componentframe to be predicted is the R image data 114, and the value of thefocus pixel Rx thereof is predicted using the average of two adjacentpixels Ba and Bb of the B image data 113 that is the reference componentframe adjacent to the focus pixel Rx. Such pixel position compensationprediction has a high probability of being selected in a block of animage including a horizontal edge. The pixel value of the focus pixel Rxis predicted according to the average of the two adjacent pixels Ba andBb, and thus, it is possible to mitigate encoding distortion included inthe decoded value of the B image data 113, which is the referencecomponent frame.

In (D), the pixel Bb and the pixel Rx belong to the same sample point,but the focus pixel Rx is affected by the pixels Ba and Bb due tointerpolation. Thus, the range of the reference pixel of the B imagedata 113 that is the reference component frame is offset by 0.5 pixelsto the left in relation to the position of the focus pixel Rx.Therefore, in this case, the motion vector V(B)=(−0.5,0).

The reference pattern of (E) shows an example in which the componentframe to be predicted is the R image data 114, and the value of thefocus pixel Rx thereof is predicted using the average of two adjacentpixels Bc and Bd of the B image data 113 that is the reference componentframe adjacent to the focus pixel Rx. Such pixel position compensationprediction has a high probability of being selected in a block of animage including a horizontal edge. The pixel value of the focus pixel Rxis predicted according to the average of the two adjacent pixels Bc andBd, and thus, it is possible to mitigate encoding distortion included inthe decoded value of the B image data 113, which is the referencecomponent frame.

In (E), the pixel Bb and the pixel Rx belong to the same sample point,but the focus pixel Rx is affected by the pixels Bc and Bd due tointerpolation. Thus, the range of the reference pixel of the B imagedata 113 that is the reference component frame is offset by 0.5 pixelsto the left and one pixel downward in relation to the position of thefocus pixel Rx. Therefore, in this case, the motion vectorV(B)=(−0.5,1).

In this manner, in (B) to (E), by predicting the pixel value of thefocus pixel Rx by averaging two adjacent pixels, the R image data 114 isencoded to the P-picture. Thus, it is possible to mitigate encodingdistortion included in the decoded value of the B image data 113, whichis the reference component frame.

FIG. 14 is a descriptive view showing an example 6 of pixel positioncompensation prediction between component frames. In FIG. 14, the valueof the focus pixel of the component frame to be predicted is predictedaccording to the value of the pixel at a differing position in areference component frame that is the same color component as the colorcomponent to be predicted. In FIG. 14, the component frame to bepredicted is the R image data 114, the focus pixel thereof is the pixelRx, and the reference component frame is the B image data 113.

In the reference pattern of (A), the value of the focus pixel Rx of theR image data 114 that is the component frame to be predicted ispredicted according to the value of the pixel Ba positioned to the upperleft in the RAW image data 100. In this case, the reference pixel(sample point belonging to Ba) of the B image data 113 that is thereference component frame is offset by one pixel to the left as comparedto the position of the focus pixel (sample point belonging to Rx) in theR image data 114 that is the component frame to be predicted. Therefore,the motion vector V(B)=(−1,0). Such motion compensation prediction has ahigh probability of being selected in a block of an image including adiagonal edge from the lower right to the upper left.

In the reference pattern of (B), the value of the focus pixel Rx of theR image data 114 that is the component frame to be predicted ispredicted according to the value of the pixel Bb positioned to the upperright in the RAW image data 100. In this case, the reference pixel(sample point belonging to Bb) of the B image data 113 that is thereference component frame is in the same position as the focus pixel(sample point belonging to Rx) in the R image data 114 that is thecomponent frame to be predicted. Therefore, the motion vectorV(B)=(0,0). Such motion compensation prediction has a high probabilityof being selected in a block of an image including a diagonal edge fromthe lower left to the upper right.

In the reference pattern of (C), the value of the focus pixel Rx of theR image data 114 that is the component frame to be predicted ispredicted according to the value of the pixel Bd positioned to the lowerright in the

RAW image data 100. In this case, the reference pixel (sample pointbelonging to Bd) of the B image data 113 that is the reference componentframe is offset by one pixel downward as compared to the position of thefocus pixel (sample point belonging to Rx) in the R image data 114 thatis the component frame to be predicted. Therefore, the motion vectorV(B)=(0,1). Such motion compensation prediction has a high probabilityof being selected in a block of an image including a diagonal edge fromthe upper left to the lower right.

In the reference pattern of (D), the value of the focus pixel Rx of theR image data 114 that is the component frame to be predicted ispredicted according to the value of the pixel Bc positioned to the lowerright in the RAW image data 100. In this case, the reference pixel(sample point belonging to Bc) of the B image data 113 that is thereference component frame is offset by one pixel to the right and onepixel downward as compared to the position of the focus pixel (samplepoint belonging to Rx) in the R image data 114 that is the componentframe to be predicted. Therefore, the motion vector V(B)=(−1,1). Suchmotion compensation prediction has a high probability of being selectedin a block of an image including a diagonal edge from the upper right tothe lower left.

In this manner, by predicting the pixel value of the focus pixel Rxusing one pixel B, the R image data 114 is encoded to the P-picture.Thus, it is possible to mitigate encoding distortion included in thedecoded value of the B image data 113, which is the reference componentframe.

FIG. 15 is a descriptive view showing an example 7 of pixel positioncompensation prediction between component frames. In FIG. 15, the valueof the focus pixel of the component frame to be predicted is predictedaccording to the value of the pixel at a differing position in areference component frame that differs in color component from thecomponent frame to be predicted. In FIG. 15, the component frame to bepredicted is the G1 image data 111, the focus pixel thereof is the pixelG1 x, and the reference component frame is the B image data 113 or the Rimage data 114.

The reference pattern of (A) shows an example in which the value of thefocus pixel G1 x of the G1 image data 111 that is the component frame tobe predicted is predicted using the average of two adjacent pixels Baand Bb of the B image data 113 that is the reference component frameadjacent to the focus pixel G1 x. By predicting the pixel value of thefocus pixel G1 x by averaging the two adjacent pixels Ba and Bb, the G1image data 111 is encoded to the P-picture. Such pixel positioncompensation prediction has a high probability of being selected in ablock of an image including a horizontal edge. Also, it is possible tomitigate encoding distortion included in the decoded value of the Bimage data 113, which is the reference component frame.

In (A), the pixel Bb and the focus pixel G1 x belong to the same samplepoint, but the focus pixel G1 x is affected by the pixels Ba and Bb dueto interpolation. Thus, the range of the reference pixel of the B imagedata 113 that is the reference component frame is offset by 0.5 pixelsto the left in relation to the position of the focus pixel G1 x.Therefore, in this case, the motion vector V(B)=(−0.5,0).

The reference pattern of (B) shows an example in which the value of thefocus pixel G1 x of the G1 image data 111 that is the component frame tobe predicted is predicted using the average of two adjacent pixels Raand Rb of the R image data 114 that is the reference component frameadjacent to the focus pixel G1 x. By predicting the pixel value of thefocus pixel G1 x by averaging the two adjacent pixels Ra and Rb, the G1image data 111 is encoded to the P-picture. Such pixel positioncompensation prediction has a high probability of being selected in ablock of an image including a vertical edge. Also, it is possible tomitigate encoding distortion included in the decoded value of the Rimage data 114, which is the reference component frame.

In (B), the reference pixel Rb and the focus pixel G1 x belong to thesame sample point, but the focus pixel G1 x is affected by the pixels Raand Rb due to interpolation. Thus, the range of the reference pixel ofthe R image data 114 that is the reference component frame is offset by0.5 pixels upward in relation to the position of the focus pixel G1 x.Therefore, in this case, the motion vector V(R)=(0,−0.5).

The reference pattern of (C) is a combination of the pixel positioncompensation prediction of (A) and the pixel position compensationprediction of (B). In other words, (C) is an example in which the valueof the focus pixel G1 x of the G1 image data 111 that is the componentframe to be predicted is predicted using the average of two adjacentpixels Ba and Bb of the B image data 113 and two adjacent pixels Ra andRb of the R image data 114, the B image data 113 and the R image data114 being the reference component frames adjacent to the focus pixel G1x.

By predicting the pixel value of the focus pixel G1 x by averaging thefour adjacent pixels Ba, Bb, Ra, and Rb, the G1 image data 111 isencoded to the B-picture. Thus, it is possible to mitigate encodingdistortion included in the decoded value of the G1 image data 111 andthe R image data 114, which are the reference component frames.

In (C), the pixels Bb and Rb and the focus pixel G1 x belong to the samesample point, but the focus pixel G1 x is affected by the pixels Ba, Bb,Ra, and Rb due to interpolation. Thus, the range of the reference pixelof the B image data 113 that is the reference component frame is offsetby 0.5 pixels to the left in relation to the position of the focus pixelG1 x, and the range of the reference pixel of the R image data 114 isoffset by 0.5 pixels upward in relation to the position of the focuspixel G1 x. Therefore, in this case, the motion vector V(G) is definedas V(G1)+V(R)=(−0.5,−0.5).

FIG. 16 is a descriptive view showing an example 4 of pixel positioncompensation prediction between component frames. In FIG. 16, the valueof the focus pixel of the component frame to be predicted is predictedaccording to the value of the pixel at a differing position in areference component frame that differs in color component from thecomponent frame to be predicted. In FIG. 16, the component frame to bepredicted is the G2 image data 112, the focus pixel thereof is the pixelG2 x, and the reference component frame is the B image data 113 or the Rimage data 114.

The reference pattern of (A) shows an example in which the value of thefocus pixel G2 x of the G2 image data 112 that is the component frame tobe predicted is predicted using the average of two adjacent pixels Baand Bb of the B image data 113 that is the reference component frameadjacent to the focus pixel G2 x. By predicting the pixel value of thefocus pixel G2 x by averaging the two adjacent pixels Ba and Bb, the G2image data 112 is encoded to the P-picture. Such pixel positioncompensation prediction has a high probability of being selected in ablock of an image including a vertical edge. Also, it is possible tomitigate encoding distortion included in the decoded value of the Bimage data 113, which is the reference component frame.

In (A), the pixel Ba and the focus pixel G2 x belong to the same samplepoint, but the focus pixel G2 x is affected by the pixels Ba and Bb dueto interpolation. Thus, the range of the reference pixel of the B imagedata 113 that is the reference component frame is offset by 0.5 pixelsdownward in relation to the position of the focus pixel G2 x. Therefore,in this case, the motion vector V(B)=(0,0.5).

The reference pattern of (B) shows an example in which the value of thefocus pixel G2 x of the G2 image data 112 that is the component frame tobe predicted is predicted using the average of two adjacent pixels Raand Rb of the R image data 114 that is the reference component frameadjacent to the focus pixel G2 x. By predicting the pixel value of thefocus pixel G2 x by averaging the two adjacent pixels Ra and Rb, the G2image data 112 is encoded to the P-picture. Such pixel positioncompensation prediction has a high probability of being selected in ablock of an image including a horizontal edge. Also, it is possible tomitigate encoding distortion included in the decoded value of the Rimage data 114, which is the reference component frame.

In (B), the reference pixel Rb and the focus pixel G2 x belong to thesame sample point, but the focus pixel G2 x is affected by the pixels Raand Rb due to interpolation. Thus, the range of the reference pixel ofthe R image data 114 that is the reference component frame is offset by0.5 pixels to the right in relation to the position of the focus pixelG2 x. Therefore, in this case, the motion vector V(R)=(0.5,0).

The reference pattern of (C) is a combination of the pixel positioncompensation prediction of (A) and the pixel position compensationprediction of (B). In other words, (C) is an example in which the valueof the focus pixel G2 x of the G2 image data 112 that is the componentframe to be predicted is predicted using the average of two adjacentpixels Ba and Bb of the B image data 113 and two adjacent pixels Ra andRb of the R image data 114, the B image data 113 and the R image data114 being the reference component frames adjacent to the focus pixel G2x.

By predicting the pixel value of the focus pixel Bx by averaging thefour adjacent pixels Ba, Bb, Ra, and Rb, the G2 image data 112 isencoded to the B-picture. Thus, it is possible to further mitigateencoding distortion included in the decoded value of the B image data113 and the R image data 114, which are the reference component frames.

In (C), the pixels Ba and Rb and the focus pixel G2 x belong to the samesample point, but the focus pixel G2 x is affected by the pixels Ba, Bb,Ra, and Rb due to interpolation. Thus, the range of the reference pixelof the B image data 113 that is the reference component frame is offsetby 0.5 pixels downward in relation to the position of the focus pixel G2x, and the range of the reference pixel of the R image data 114 isoffset by 0.5 pixels to the right in relation to the position of thefocus pixel G2 x. Therefore, in this case, the motion vector V(G) isdefined as V(B)+V(R)=(0.5,0.5).

When the encoding unit 402 performs encoding to a P-picture or aB-picture, the pixel position compensation predictions shown in FIGS. 9to 16 are tested, and the pixel position compensation prediction withthe smallest difference is selected. For example, if the R image data114 is to be predicted with reference to the B image data 113, then theencoding unit 402 tests the pixel position compensation prediction foreach reference pattern of (A) to (E) in FIG. 13 and (A) to (D) in FIG.14, and selects the reference pattern with the smallest difference. As aresult, suitable and efficient pixel position compensation predictioncan be executed.

In particular, if the reference pattern crosses an edge of the image,the difference in value between the pixels in the reference componentframes surrounding the edge and the focus pixel of the component frameto be predicted is large. Thus, by appropriately selecting the referencepattern, it is possible to identify a reference pattern that does notcross an edge to improve the encoding efficiency.

<Data Structure Example of Encoded Component Frame>

FIG. 17 is a descriptive drawing showing a data structure example for anencoded component frame. An encoded component frame 1700 has headerinformation 1701 and an encoded data array 1702. The header information1701 is information added by the encoding unit 402. The headerinformation 1701 includes image format information 1711 and controlinformation 1712.

The image format information 1711 includes the size of the componentframe prior to encoding, the size of the encoded component frame 1700,identification information specifying the pattern of the color array101, and the pixel count of the component frame. The control information1712 includes the type of encoded component frame 1700 (any one ofI-picture, P-picture, B-picture), identification information for thereference component frame, and a reference pattern used for the pixelposition compensation prediction shown in FIGS. 9 to 16. The encodeddata array 1702 is a data array in which the component frame is encoded.

<Example of Encoding Process Steps>

FIG. 18 is a flowchart showing an example of encoding process steps bythe encoder 400. The encoder 400 receives input of the RAW image data100 (step S1801), and uses the first generation unit 401 to separate thepixel groups of the RAW image data 100 for each color component andgenerate a component frame for each color component (step S1802). Next,the encoder 400 uses the encoding unit 402 to generate an I-picture byexecuting in-component-frame prediction encoding (step S1803).

Then, the encoder 400 uses the encoding unit 402 to executeinter-component-frame prediction encoding for the remaining componentframes to generate a P-picture or a B-picture (step S1804). Lastly, theencoder 400 uses the recording unit 403 to store the encoded image datagroup that was encoded in steps S1803 and S1804 in the storage device302 (step S1805).

<Mechanical Configuration Example of Decoder>

FIG. 19 is a block diagram showing a mechanical configuration example ofthe decoder. The decoder 1900 has an acquisition unit 1901, a decodingunit 1902, and a second generation unit 1903. The acquisition unit 1901acquires the encoded component frame 1700 that was encoded by theencoder 400, the decoding unit 1902 decodes the encoded component frame1700 to the component frame using the control information 1712, and thesecond generation unit 1903 generates the RAW image data 100 from thedecoded component frames using the image format information 1711.

The acquisition unit 1901, the decoding unit 1902, and the secondgeneration unit 1903 are specifically functions realized by the LSI 304,or by the processor 301 executing programs stored in the storage device302, for example.

The acquisition unit 1901 acquires first encoded image data and secondencoded image data. The first encoded image data is data attained byperforming in-frame prediction encoding of the first image dataconstituted of pixel groups of the first color component. The secondencoded image data is acquired by encoding, on the basis of the firstimage data, the second image data constituted of pixel groups of thesecond color component differing from the first color component.

Also, the acquisition unit 1901 acquires third encoded image data. Thethird encoded image data is data attained by encoding, on the basis ofthe first encoding data, the third image data constituted of pixelgroups of the third color component. Also, the acquisition unit 1901acquires fourth encoded image data. The fourth encoded image data isdata attained by encoding, on the basis of the first image data, fourthimage data constituted of pixel groups of the fourth color component.

The decoding unit 1902 decodes the first encoded image data acquired bythe acquisition unit 1901 to the first image data using the controlinformation 1712, and decodes the second encoded image data acquired bythe acquisition unit 1901 to the second image data on the basis of thefirst image data. Specifically, for example, the decoding unit 1902decodes the first encoded image data that is an I-picture to the firstimage data, and decodes the second image data that is a P-picture to thesecond image data using the first image data according to the referencepattern applied to pixel position compensation prediction.

Also, the decoding unit 1902 decodes the third encoded image dataacquired by the acquisition unit 1901 to the third image data using thecontrol information 1712, on the basis of the first image data.Specifically, for example, if the third encoded image data is aP-picture, the decoding unit 1902 decodes the third encoded image datato the third image data using the first image data according to thereference pattern applied to pixel position compensation prediction, andif the third encoded image data is a B-picture, the decoding unit 1902decodes the third encoded image data to the third image data using thefirst image data and the second image data according to the referencepattern applied to pixel position compensation prediction.

Also, the decoding unit 1902 decodes the fourth encoded image dataacquired by the acquisition unit 1901 to the fourth image data using thecontrol information 1712, on the basis of the first image data.Specifically, for example, if the fourth encoded image data is aP-picture, the decoding unit 1902 decodes the fourth encoded image datato the fourth image data using any of the first image data to the thirdimage data according to the reference pattern applied to pixel positioncompensation prediction, and if the third encoded image data is aB-picture, the decoding unit 1902 decodes the fourth encoded image datato the fourth image data using two pieces of image data among the firstimage data to the third image data according to the reference patternapplied to pixel position compensation prediction.

The second generation unit 1903 identifies the pattern of the colorarray from the image format information 1711, and generates the RAWimage data 100 in which the first color component to the fourth colorcomponent are repeatedly arranged, from the pixel groups of the firstimage data to the fourth image data decoded by the decoding unit 1902such that the color array pattern is the identified color array 101.

<Configuration Example of Decoding Unit 1902>

FIG. 20 is a block diagram showing a configuration example of thedecoding unit 1902. The decoding unit 1902 has a variable-length codedecoding unit 2001, an inverse quantization unit 2002, an inverseorthogonal transformation unit 2003, an addition unit 2004, a thirdaccumulation unit 2005, and a second pixel position compensation unit2006.

The variable-length code decoding unit 2001 decodes the inputted encodedcomponent frame and outputs a quantization coefficient and a motionvector. The decoded quantization coefficient is inputted to the inversequantization unit 2002 and the decoded motion vector is inputted to thesecond pixel position compensation unit 2006.

The inverse quantization unit 2002 performs inverse quantization on aquantized coefficient at the block level to decode the frequencycoefficient. The inverse orthogonal transformation unit 2003 performsinverse orthogonal transformation on the frequency coefficient decodedby the inverse quantization unit 2002 to decode the prediction errorvalue (or signal of original image).

The addition unit 2004 adds the decoded prediction error value to aprediction value generated by the second pixel position compensationunit 2006, thereby outputting the decoded image data at the block level.The image data outputted from the addition unit 2004 is outputted as thecomponent frame and inputted to the third accumulation unit 2005.

The third accumulation unit 2005 accumulates the decoded value of theimage as the reference component frame. Image data not referred to inpixel position compensation prediction thereafter is sequentiallydeleted from the third accumulation unit 2005. The second pixel positioncompensation unit 2006 outputs, to the addition unit 2004, theprediction values predicted at the block level for the image to bedecoded on the basis of the motion vector and the reference componentframe.

<Example of Decoding Process Steps>

FIG. 21 is a flowchart showing an example of decoding process steps bythe decoder 1900. The decoder 1900 uses the acquisition unit 1901 toacquire the encoded image data group as the encoded component framegroup (step S2101), and uses the decoding unit 1902 to decode the firstencoded image data (I-picture) to the component frame (step S2102).

Next, the decoder 1900 uses the decoding unit 1902 to decode thesubsequent encoded image data (P-picture or B-picture) to componentframes (step S2103). Then, the decoder 1900 uses the second generationunit 1903 to combine the decoded generated frame groups to restore theRAW image data 100 (step S2104).

Thus, according to Embodiment 1, by performing inter-component-frameprediction of the RAW image data 100 by relying on the property that thehue and the chroma result in a higher degree of correlation amongcomponent frames, it is possible to improve encoding efficiency for theRAW image data 100 in which there is a high degree of correlation amongthe component frames. Also, it is possible to restore the original RAWimage data 100 even if encoding is performed by inter-component-frameprediction encoding.

Embodiment 2

In Embodiment 2, encoding in which white balance adjustment is performedon the RAW image data 100 and then encoding is performed, and decodingin which the component frames are decoded to generate the whitebalance-adjusted RAW image data and then inverse white balanceadjustment is performed to generate the RAW image data is performed. InEmbodiment 2, differences from Embodiment 1 will be primarily described,and the same components as those of Embodiment 1 are assigned the samereference characters and descriptions thereof are omitted.

<Encoding and Decoding Example>

FIG. 22 is a descriptive drawing showing an encoding and decodingexample of Embodiment 2. (E) WB (white balance) adjustment, (A)separation, and (B) encoding are executed by the encoder 400, and (C)decoding, (D) combining, and (F) inverse WB adjustment are executed bythe decoder 1900.

(E) The encoder 400 performs white balance adjustment on the RAW imagedata 100. White balance adjustment is executed according to whitebalance (in the encoder 400) setting items (auto, manual, tungsten,cloudy, fluorescent, shady, daylight, etc.) that are set when generatingthe RAW image data by performing imaging. The RAW image data 100 thathas undergone white balance adjustment is WB-adjusted RAW image data2200.

(A) The encoder 400 generates a component frame for each color componentfrom the WB-adjusted RAW image data 2200. Specifically, for example, theencoder 400 generates G1 image data 2211 that is a color component framefor green (G1), G2 image data 2212 that is a color component frame forgreen (G2), B image data 2213 that is a color component frame for blue(B), and R image data 2214 that is a color component frame for red (R).

The G1 image data 2211 is image data constituted of a G1 pixel groupfrom the color arrays 101 in the WB-adjusted RAW image data 2200. The G2image data 2212 is image data constituted of a G2 pixel group from thecolor arrays 101 in the WB-adjusted RAW image data 2200.

The B image data 2213 is image data constituted of a B pixel group fromthe color arrays 101 in the WB-adjusted RAW image data 2200. The R imagedata 2214 is image data constituted of an R pixel group from the colorarrays 101 in the WB-adjusted RAW image data 2200.

(B) The encoder 400 encodes the component frames between the componentframes. Specifically, for example, the encoder 400 encodes one componentframe group by in-frame prediction encoding to generate an I-picture,and encodes the remaining component frame groups by employing in-frameprediction encoding using the I-picture to a P-picture or a B-picture.Here, the G1 image data 2211 is encoded to G1 encoded image data 2221,the G2 image data 2212 is encoded to G2 encoded image data 2222, the Bimage data 2213 is encoded to B encoded image data 2223, and the R imagedata 2214 is encoded to R encoded image data 2224.

(C) The decoder 1900 decodes the encoded component frame group.Specifically, for example, the decoder 1900 decodes the I-picture, andthen, uses the component frame decoded from the I-picture to decode theP-picture or B-picture, to generate another component frame. In otherwords, the decoder 1900 decodes the G1 encoded image data 2221, the G2encoded image data 2222, the B encoded image data 2223, and the Rencoded image data 2224 to generate the G1 image data 2211, the G2 imagedata 2212, the B image data 2213, and the R image data 2214.

(D) The decoder 1900 combines the component frames in the decodedcomponent frame group to generate the WB-adjusted RAW image data 2200.Specifically, for example, pixels G1, G2, B, and R in the same positionin the G1 image data 2211, the G2 image data 2212, the B image data2213, and the R image data 2214 are arranged according to the colorarray 101 to decode the WB-adjusted RAW image data 2200 from the G1image data 2211, the G2 image data 2212, the B image data 2213, and theR image data 2214.

(F) The decoder 1900 performs inverse WB adjustment to convert theWB-adjusted RAW image data 2200 to the original RAW image data 100.

Thus, by performing inter-component-frame prediction of the WB-adjustedRAW image data 2200 by relying on the property that the hue and thechroma result in a higher degree of correlation among component frames,it is possible to improve encoding efficiency for the WB-adjusted RAWimage data 2200. Also, it is possible to restore the originalWB-adjusted RAW image data 2200 even if encoding is performed byinter-component-frame prediction encoding.

Also, blue (B) and red (R) have a low pixel value that is a signal levelcompared to green (G), and thus, have low correlation. Thus, byperforming white balance adjustment on the RAW image data 100 prior tobeing encoded, the signal levels for the blue (B) and the red (R) arebrought closer to the signal level of the green (G). As a result, it ispossible to improve the encoding efficiency for the WB-adjusted RAWimage data 2200.

Also, the white balance adjustment is executed for the RAW image data100 prior to encoding, and thus, white balance adjustment for RAW imagedata 100 to be decoded is unnecessary. However, inverse white balanceadjustment may be performed in order to restore the WB-adjusted RAWimage data 2200 when decoding.

<Mechanical Configuration Example of Encoder 400>

FIG. 23 is a block diagram showing a mechanical configuration example ofthe encoder 400 according to Embodiment 2. The encoder 400, in additionto the components shown in Embodiment 1, has a white balance adjustmentunit 2301 and a white balance detection unit 2302. The white balanceadjustment unit 2301 and the white balance detection unit 2302 arespecifically functions realized by the LSI 304, or by the processor 301executing programs stored in the storage device 302, for example.

The white balance adjustment unit 2301 performs white balance adjustmenton the RAW image data 100 according to white balance setting items(auto, manual, tungsten, cloudy, fluorescent, shady, daylight, etc.),and outputs the WB-adjusted RAW image data 2200 to the first generationunit 401. Thus, the first generation unit 401 performs separation ofcomponent frames such as shown in (A) of FIG. 22 for the WB-adjusted RAWimage data 2200. White balance adjustment is performed by multiplyingthe pixel values of the RAW image data 100 by a white balance adjustmentgain coefficient, with the black level of the RAW image data 100 as thestandard, for example. For example, where the black level of the RAWimage data 100 is OB, the pixel value of the B component is XB, and thewhite balance adjustment gain coefficient of the B component is AB, thepixel value YB of the B component after white balance adjustment iscalculated as follows:

When 0B≤XB, YB=(XB−0B)×AB+0B

When XB<OB, YB=(OB−XB)×AB+OB

The white balance detection unit 2302 detects the white balance typesuited to the RAW image data 100 according to the RAW image data 100 andnotifies the white balance adjustment unit 2301. As a result, the whitebalance adjustment unit 2301 performs white balance adjustment on theRAW image data 100 according to the white balance (auto, manual,tungsten, cloudy, fluorescent, shady, daylight, etc.) type received inthe notification.

Also, the white balance detection unit 2302 notifies the encoding unit402 of information (white balance control information) identifying thedetected type of white balance. Specifically, for example, the whitebalance control information is outputted to the variable-length codingunit 605 as shown in FIG. 6. As a result, the encoding unit 402 canassign white balance control information to the control information 1712in the header information 1701 of the encoded component frame 1700,which was encoded by the encoding unit 402.

Here, the white balance control information is constituted ofinformation indicating that the RAW image data 2200 has undergone whitebalance adjustment (hereinafter referred to as “adjustmentinformation”), the white balance gain of the B component of the RAWimage data 2200, and the white balance adjustment gain coefficient ofthe R component of the RAW image data 2200, for example.

The decoder 1900 to be described later can recognize that the RAW imagedata 2200 has undergone white balance adjustment according to theadjustment information assigned to the control information 1712. Also,the decoder 1900 uses the white balance adjustment gain coefficients ofthe B component and the R component assigned to the control information1712 to enable inverse white balance adjustment during decoding.

The control information 1712 has assigned thereto at least one of theabove-mentioned adjustment information, the white balance adjustmentgain coefficient of the B component, and the white balance adjustmentgain coefficient of the R component.

Embodiment 2 shows an example in which white balance adjustment isperformed on the RAW image data 100. However, instead of white balanceadjustment, a process of reducing at least one of the followingdifferences may be performed: the difference between the value of the Rcolor component data and the value of the G color component data; thedifference between the value of the G color component data and the valueof the B color component data; and the difference between the value ofthe B color component data and the value of the R color component data.

<Example of Encoding Process Steps>

FIG. 24 is a flowchart showing an example of encoding process steps bythe encoder 400 according to Embodiment 2. The encoder 400 receivesinput of the RAW image data 100 (step S2401), performs white balanceadjustment on the RAW image data 100 using the white balance adjustmentunit 2301, and outputs the WB-adjusted RAW image data 2200 (step S2402).The encoder 400 uses the first generation unit 401 to separate the pixelgroups of the WB-adjusted RAW image data 2200 for each color componentand generate a component frame for each color component (step S2403).

Next, the encoder 400 uses the encoding unit 402 to generate anI-picture by executing in-component-frame prediction encoding (stepS2404). Then, the encoder 400 uses the encoding unit 402 to executeinter-component-frame prediction encoding for the remaining componentframes to generate a P-picture or a B-picture (step S2405). Lastly, theencoder 400 uses the recording unit 403 to store the encoded image datagroup that was encoded in steps S2404 and S2405 in the storage device302 (step S2406).

<Mechanical Configuration Example of Decoder 1900>

FIG. 25 is a block diagram showing a mechanical configuration example ofthe decoder 1900 according to Embodiment 2. The decoder 1900, inaddition to the components shown in Embodiment 1, has an inverse whitebalance adjustment unit 2504. The inverse white balance adjustment 2504is a specifically function realized by the LSI 304, or by the processor301 executing programs stored in the storage device 302, for example. InEmbodiment 2, the second generation unit 1903 generates the adjusted RAWimage data 2200.

The inverse white balance adjustment unit 2504 refers to the whitebalance control information in the header information 1701 assigned tothe WB-adjusted RAW image data 2200 attained from the second generationunit 1903 to perform inverse white balance adjustment on the WB-adjustedRAW image data 2200 and restore the original RAW image data 100.

<Example of Decoding Process Steps>

FIG. 26 is a flowchart showing an example of decoding process steps bythe decoder 1900 according to Embodiment 2. After steps S2101 to S2104,the decoder 1900 uses the inverse white balance adjustment unit 2504 toperform inverse white balance adjustment on the WB-adjusted RAW imagedata 2200, to restore the original RAW image data 100 (step S2605).

Thus, according to Embodiment 2, in a manner similar to Embodiment 1, byperforming inter-component-frame prediction of the WB-adjusted RAW imagedata 2200 by relying on the property that the hue and the chroma resultin a higher degree of correlation among component frames, it is possibleto improve encoding efficiency for the WB-adjusted RAW image data 2200.Also, it is possible to restore the original RAW image data 100 even ifencoding is performed by inter-component-frame prediction encoding.

Also, blue (B) and red (R) have a low pixel luminance value that is asignal level compared to green (G), and thus, have low correlation.Thus, by performing white balance adjustment on the RAW image data 100prior to being encoded, the signal levels for the blue (B) and the red(R) are brought closer to the signal level of the green (G). As aresult, it is possible to improve the encoding efficiency for theWB-adjusted RAW image data 2200. Also, the white balance adjustment isexecuted for the RAW image data 100 prior to encoding, and thus, byomitting the inverse white balance adjustment unit 2504, white balanceadjustment of RAW image data 100 to be decoded can be made unnecessary.

Embodiment 3

Embodiment 3 is an example in which encoding and decoding is performedon RAW video data in which the RAW image data 100 is arrayed along atime axis. In Embodiment 3, differences from Embodiment 1 will beprimarily described, and the same components as those of Embodiment 1are assigned the same reference characters and descriptions thereof areomitted.

<Encoding and Decoding Example>

FIG. 27 is a descriptive drawing showing an encoding and decodingexample of Embodiment 3. (A) Separation and (B) encoding are executed bythe encoder 400, and (C) decoding and (D) combining are executed by thedecoder 1900.

(A) The encoder 400 acquires RAW video data 2700 in which the RAW imagedata 100 is arrayed along a time axis, and generates a component framefor each color component for each piece of RAW image data 100. As aresult, a G1 image data array 2711, a G2 image data array 2712, a Bimage data array 2713, and an R image data array 2714 are attained.

(B) The encoder 400 encodes the color component frames between thecomponent frames. Specifically, for example, the encoder 400 encodes onecomponent frame group by in-frame prediction encoding to generate anI-picture, and encodes the remaining component frame groups by employingin-frame prediction encoding using the I-picture to a P-picture or aB-picture. Here, the G1 image data array 2711 is encoded to a G1 encodedimage data array 2721, the G2 image data array 2712 is encoded to a G2encoded image data array 2722, the B image data array 2713 is encoded toa B encoded image data array 2723, and the R image data array 2714 isencoded to an R encoded image data array 2724.

(C) The decoder 1900 decodes the encoded component frame group.Specifically, for example, the decoder 1900 decodes the I-picture, andthen, uses the component frame decoded from the I-picture to decode theP-picture or B-picture, to generate another component frame. In otherwords, the decoder 1900 decodes the G1 encoded image data array 2721,the G2 encoded image data array 2722, the B encoded image data array2723, and the R encoded image data array 2724, to generate the G1 imagedata array 2711, the G2 image data array 2712, the B image data array2713, and the R image data array 2714.

(D) The decoder 1900 combines the component frames in the decodedcomponent frame group to generate the RAW image data 100. Specifically,for example, pixels G1, G2, B, and R in the same position in the G1image data 111, the G2 image data 112, the B image data 113, and the Rimage data 114 are arranged according to the color array 101 to decodethe RAW image data 100 sequentially, thereby decoding the RAW video data2700.

Thus, by performing inter-component-frame prediction of the RAW imagedata 100 by relying on the property that the hue and the chroma resultin a higher degree of correlation among component frames, it is possibleto improve encoding efficiency for the RAW image data 100 in which thereis a high degree of correlation among the component frames, andtherefore, to improve the encoding efficiency for the RAW video data2700. Also, it is possible to restore the original RAW image data 100,and therefore, the RAW video data 2700, even if encoding is performed byinter-component-frame prediction encoding.

<Reference Direction Example for Component Frames>

FIG. 28 is a descriptive drawing showing a reference direction examplefor component frames. (A) shows one example of the RAW video data 2700.(B) and (C) show an example of the reference direction for componentframes in the RAW video data 2700. In (B) and (C), for ease ofdescription, among the chronological RAW image data 1 to n (n being aninteger of 2 or greater), an example of a reference direction for thecomponent frames in RAW image data 1 and RAW image data 2 will bedescribed.

(B) shows a reference direction for a case in which the component framesfrom the same RAW image data 1 and 2 are inputted in the order of the G1image data 111, the G2 image data 112, the B image data 113, and the Rimage data 114. In the RAW image data 1 and the RAW image data 2, the G1image data 111, which is the first component frame, is encoded to anI-picture. The subsequently inputted G2 image data 112 is encoded into aP-picture by inter-frame prediction encoding with the preceding G1 imagedata 111 as the reference frame.

The subsequently inputted B image data 113 is encoded into a P-pictureor a B-picture by inter-frame prediction encoding with at least one ofthe preceding G1 image data 111 and G2 image data 112 as the referenceframe. The R image data 114 inputted last is encoded into a P-picture ora B-picture by inter-frame prediction encoding with at least one of thepreceding G1 image data 111, G2 image data 112, and B image data 113 asthe reference frame.

In (C), the reference direction in the first RAW image data 1 is thesame as the RAW image data 1 of (B). Regarding the RAW image data 2, thefirst G1 image data 111 is encoded into a P-picture or a B-picture byinter-frame prediction encoding with the G1 image data 111 of thepreceding RAW image data 1 as the reference frame.

The subsequently inputted G2 image data 112 is encoded into a P-pictureor a B-picture by inter-frame prediction encoding with the componentframe of at least one of the G1 image data 111 of the preceding RAWimage data 1 and the G1 image data 111 of the RAW image data 2 as thereference frame.

The subsequently inputted B image data 113 is encoded into a P-pictureor a B-picture by inter-frame prediction encoding with the componentframe of at least one of the B image data 113 of the preceding RAW imagedata 1, the G1 image data 111 of the RAW image data 2, and the G2 imagedata 112 of the RAW image data 2 serving as the reference frame.

The R image data 114 inputted last is encoded into a P-picture or aB-picture by inter-frame prediction encoding with the component frame ofat least one of the R image data 114 of the preceding RAW image data 1,the G1 image data 111 of the RAW image data 2, the G2 image data 112 ofthe RAW image data 2, and the B image data 113 of the RAW image data 2as the reference frame.

The reference directions shown in FIG. 28 are merely examples, andencoding is possible in an input order for component frames other than(B) and (C). Also, this encoding unit 402 uses the luminance value frompixels in the image capture element 353 that do not depend on the colorcomponents, and thus, can perform encoding even when differing colorcomponents are used as the reference component frame.

<Slice-Level Encoding Example>

FIG. 29 is a descriptive drawing showing an example of encoding at theslice level. (A) shows the slice level of each component frame generatedfrom the chronological RAW image data 1 to 4. A slice is data formed bysplitting the component frame, and is one unit for encoding. Here, eachcomponent frame (G1 image data 111, G2 image data 112, B image data 113,R image data 114) has n slices of the same size (n being an integer of 2or greater). (B) in FIG. 28 is shown as an example of the order in whichthe component frames are inputted, but another input order may be used.

(B) shows an encoding process example at the slice level. The arrowsshow the encoding order. That is, a G1 component slice 1, a G2 componentslice 1, a B component slice 1, and an R component slice 1, which have“1” as the slice number, are encoded in the stated order, and then, a G1component slice 2, a G2 component slice 2, a B component slice 2, and anR component slice 2, which have “2” as the slice number, are encoded inthe stated order. In this manner, the component slices are encoded indescending order according to slice number, and lastly, a G1 componentslice n, a G2 component slice n, a B component slice n, and an Rcomponent slice n, which have “n” as the slice number, are encoded inthe stated order.

In this manner, encoding is performed among component frames at theslice level, allowing for improvement in encoding latency. The referencedirection among component slices of the same slice number may be suchthat the G1 component slice is encoded to an I-picture as shown in (B)of FIG. 28 or the G1 component slice is encoded to a P-picture as shownin (C). In FIG. 29, an example of encoding at the slice level wasdescribed, but in the case of decoding as well, decoding may beperformed at the slice level, similar to encoding at the slice level. Asa result, it is possible to improve the decoding latency.

Thus, in Embodiment 3, by performing inter-component-frame prediction ofthe RAW image data 100 by relying on the property that the hue and thechroma result in a higher degree of correlation among component frames,it is possible to improve encoding efficiency for the RAW image data 100in which there is a high degree of correlation among the componentframes, and therefore, to improve the encoding efficiency for the RAWvideo data 2700. Also, it is possible to restore the original RAW imagedata 100, and therefore, the RAW video data 2700, even if encoding isperformed by inter-component-frame prediction encoding.

Also, encoding of component frames is performed at the slice level,allowing for improvement in inter-component-frame encoding latency.Similarly, by decoding component frames at the slice level, it ispossible to improve in component frame decoding latency.

Embodiment 4

Embodiment 4 is an example in which encoding and decoding is performedafter performing white balance adjustment on the RAW video data 2700 inwhich the RAW image data 100 is arrayed along a time axis, and theninverse white balance adjustment is performed. In Embodiment 4,differences from Embodiments 1 and 3 will be primarily described, andthe same components as those of Embodiments 1 and 3 are assigned thesame reference characters and descriptions thereof are omitted.

<Encoding and Decoding Example>

FIG. 30 is a descriptive drawing showing an encoding and decodingexample of Embodiment 4. (E) WB adjustment, (A) separation, and (B)encoding are executed by the encoder 400, and (C) decoding, (D)combining, and (F) inverse WB adjustment are executed by the decoder1900.

(E) The encoder 400 performs white balance adjustment on the respectiveRAW image data 100 of the RAW video data 2700. White balance adjustmentis executed according to white balance settings in the encoder 400(auto, manual, tungsten, cloudy, fluorescent, shady, daylight, etc.).The RAW image data 100 that has undergone white balance adjustment isdesignated as WB-adjusted RAW image data 2200, and the chronologicallyarranged WB-adjusted RAW image data 2200 is designated as WB-adjustedRAW video data 3000.

(A) The encoder 400 acquires WB-adjusted RAW video data 3000 in whichthe WB-adjusted RAW image data 2200 is arrayed along a time axis, andgenerates a component frame for each color component for each piece ofWB-adjusted RAW image data 2200. As a result, a WB-adjusted G1 imagedata array 3011, a WB-adjusted G2 image data array 3012, a WB-adjusted Bimage data array 3013, and a WB-adjusted R image data array 3014 areattained.

(B) The encoder 400 encodes the color component frames between thecomponent frames. Specifically, for example, the encoder 400 encodes onecomponent frame group by in-frame prediction encoding to generate anI-picture, and encodes the remaining component frame groups by employingin-frame prediction encoding using the I-picture to a P-picture or aB-picture.

Here, the WB-adjusted G1 image data array 3011 is encoded to aWB-adjusted G1 encoded image data array 3021, the WB-adjusted G2 imagedata array 3012 is encoded to a WB-adjusted G2 encoded image data array3022, the WB-adjusted B image data array 3013 is encoded to aWB-adjusted B encoded image data array 3023, and the WB-adjusted R imagedata array 3014 is encoded to a WB-adjusted R encoded image data array3024.

(C) The decoder 1900 decodes the encoded component frame group.Specifically, for example, the decoder 1900 decodes the I-picture, andthen, uses the component frame decoded from the I-picture to decode theP-picture or B-picture, to generate another component frame. In otherwords, the decoder 1900 decodes the WB-adjusted G1 encoded image dataarray 3021, the WB-adjusted G2 encoded image data array 3022, theWB-adjusted B encoded image data array 3023, and the WB-adjusted Rencoded image data array 3024, to generate the WB-adjusted G1 image data3011, the WB-adjusted G2 image data array 3012, the WB-adjusted B imagedata array 3013, and the WB-adjusted R image data array 3014.

(D) The decoder 1900 combines the component frames in the decodedcomponent frame group to generate the WB-adjusted RAW image data 2200.Specifically, for example, pixels G1, G2, B, and R in the same positionin the WB-adjusted G1 image data 2211, the WB-adjusted G2 image data2212, the WB-adjusted B image data 2213, and the WB-adjusted R imagedata 2214 are arranged according to the color array 101 to decode theWB-adjusted RAW image data sequentially, thereby decoding theWB-adjusted RAW video data 2700.

(F) The decoder 1900 performs inverse WB adjustment to convert eachpiece of the WB-adjusted RAW image data 2200 to the original RAW imagedata 100, to restore the RAW video data 2700.

Thus, by performing inter-component-frame prediction of the RAW imagedata 100 by relying on the property that the hue and the chroma resultin a higher degree of correlation among component frames, it is possibleto improve encoding efficiency for the WB-adjusted RAW image data 2200in which there is a high degree of correlation among the componentframes, and therefore, to improve the encoding efficiency for theWB-adjusted RAW video data 3000. Also, it is possible to restore theoriginal WB-adjusted RAW image data 2200, and therefore, the WB-adjustedRAW video data 3000, even if encoding is performed byinter-component-frame prediction encoding.

Also, similar to Embodiment 3, encoding of component frames is performedat the slice level for the WB-adjusted RAW video data 3000, allowing forimprovement in inter-component-frame encoding latency. Similarly, forthe WB-adjusted G1 encoded image data array 3021, the WB-adjusted G2encoded image data array 3022, the WB-adjusted B encoded image dataarray 3023, and the WB-adjusted R encoded image data array 3024, bydecoding the component frames at the slice level, it is possible toimprove the component frame decoding latency.

Thus, as described above, according to the present embodiment, byperforming inter-component-frame prediction of the RAW image data 100 byrelying on the property that the hue and the chroma result in a higherdegree of correlation among component frames, it is possible to improveencoding efficiency for the RAW image data 100 in which there is a highdegree of correlation among the component frames. Also, it is possibleto restore the original RAW image data 100 even if encoding is performedby inter-component-frame prediction encoding.

DESCRIPTION OF THE REFERENCE NUMERALS

-   100 RAW image data, 101 a color array, 111 G1 image data, 112 G2    image data, 113 B image data, 114 R image data, 121 G1 encoded image    data, 122 G2 encoded image data, 123 B encoded image data, 124 R    encoded image data, 300 an information processing apparatus, 301 a    processor, 302 a storage device, 353 an image capture element, 400    an encoder, 401 a first generation unit, 402 an encoding unit, 403 a    recording unit, 610 a position offset detection unit, 611 a first    pixel position compensation unit, 1700 an encoded component frame,    1701 header information, 1711 image format information, 1712 control    information, 1900 a decoder 1900, 1901 an acquisition unit, 1902 a    decoding unit, 1903 a second generation unit, 2006 a second pixel    position compensation unit, 2200 WB-adjusted RAW image data, 2301 a    white balance adjustment unit, 2504 an inverse white balance    adjustment unit, 2700 RAW video data, 3000 WB-adjusted RAW video    data

1-24. (canceled)
 25. An encoder, comprising: an adjustment unitconfigured to adjust a white balance of RAW image data in which a firstcolor component and a second color component differing from the firstcolor component are arranged in a repeating fashion; a generation unitconfigured to generate first image data constituted of a pixel group ofthe first color component and second image data constituted of a pixelgroup of the second color component, from white balance-adjusted RAWimage data in which the white balance was adjusted by the adjustmentunit; and an encoding unit configured to encode the second image data onthe basis of the first image data.
 26. The encoder according to claim25, wherein the encoding unit configured to generate a prediction valuefor the second image data on the basis of the first image data, andencodes the second image data on the basis of a difference between thesecond image data and the prediction value.
 27. The encoder according toclaim 25, further comprising: a detection unit configured to detect awhite balance suited to the RAW image data, wherein the adjustment unitis configured to adjust the white balance of the RAW image data on thebasis of information pertaining to the white balance detected by thedetection unit.
 28. The encoder according to claim 25, wherein, inencoding the second image data, the encoding unit is configured tocompensate pixel positions between the first image data and the secondimage data.
 29. The encoder according to claim 28, wherein, in encodingthe second image data, the encoding unit is configured to compensate afocus pixel in the second image data with a specific reference pixel inthe first image data at a position differing from the focus pixel. 30.The encoder according to claim 29, wherein, in encoding the second imagedata, the encoding unit is configured to encode the second image data onthe basis of a reference pattern, among a plurality of referencepatterns constituted of the specific reference pixel, having a smallestdifference from the focus pixel.
 31. The encoder according to claim 25,wherein the encoding unit is configured to assign, to encoded data,information pertaining to the white balance performed on the RAW imagedata.
 32. The encoder according to claim 25, wherein the adjustment unitis configured to acquire a plurality of pieces of the RAW image data andadjust the white balance for each of the pieces of RAW image data,wherein the generation unit is configured to generate the first imagedata and the second image data for each of the pieces of whitebalance-adjusted RAW image data for which the white balance was adjustedby the adjustment unit, and wherein the encoding unit is configured toencode the second image data on the basis of the first image data. 33.The encoder according to claim 32, wherein the encoding unit isconfigured to encode the second image data separated from the same RAWimage data as the first image data, on the basis of the first imagedata.
 34. The encoder according to claim 33, wherein the encoding unitis configured to encode the second image data separated from anotherpiece of the RAW image data differing from the first image data, on thebasis of the first image data.
 35. The encoder according to claim 34,wherein, on the basis of a predetermined region of the first image data,the encoding unit is configured to encode a region corresponding to thepredetermined region in the second image data generated from the RAWimage data differing from the first image data.
 36. The encoderaccording to claim 32, wherein the encoding unit is configured togenerate a prediction value for the second image data on the basis ofthe first image data, and encodes the second image data on the basis ofa difference between the second image data and the prediction value. 37.The encoder according to claim 34, further comprising: a detection unitconfigured to detect a white balance suited to each of the pieces of theRAW image data, wherein the adjustment unit is configured to adjust thewhite balance of each of the pieces of the RAW image data on the basisof information pertaining to the white balance detected by the detectionunit.
 38. A decoder, comprising: an acquisition unit configured toacquire first encoded image data in which first image data constitutedof a pixel group of a first color component of white balance-adjustedRAW image data is encoded, and second encoded image data in which secondimage data constituted of a pixel group of a second color componentdiffering from the first color component of the white balance-adjustedRAW image data is encoded on the basis of the first image data; adecoding unit configured to decode the first encoded image data acquiredby the acquisition unit to the first image data and decode the secondencoded image data acquired by the acquisition unit to the second imagedata on the basis of the first image data; a generation unit configuredto generate the white balance-adjusted RAW image data in which the firstcolor component and the second color component are arranged in arepeating fashion, on the basis of the first image data and the secondimage data decoded by the decoding unit; and an inverse adjustment unitconfigured to convert a color of the white balance-adjusted RAW imagedata back to a color prior to adjustment of the white balance.
 39. Thedecoder according to claim 38, wherein the acquisition unit isconfigured to acquire second encoded image data attained by encoding thesecond image data on the basis of a difference between the second imagedata and a prediction value for the second image data generated on thebasis of the first image data.
 40. The decoder according to claim 39,wherein the decoding unit is configured to identify a reference pixel ofthe first image data on the basis of a reference pattern indicating apixel position referred to when encoding a focus pixel of the secondimage data, and decode the focus pixel of the second image data from thesecond encoded image data on the basis of the reference pixel.
 41. Thedecoder according to claim 38, wherein the acquisition unit isconfigured to acquire third encoded image data in which third image dataconstituted of a pixel group of a third color component generated fromthe white balance-adjusted RAW image data is encoded on the basis of thefirst encoding data, wherein the third color component is a same colorcomponent as either the first color component or the second colorcomponent, or differs from both the first color component and the secondcolor component, wherein the decoding unit is configured to decode thethird encoded image data to the third image data on the basis of thefirst image data, and wherein the generation unit is configured togenerate the white balance-adjusted RAW image data in which the firstcolor component, the second color component, and the third colorcomponent are arranged in a repeating fashion, on the basis of the firstimage data, the second image data, and the third image data decoded bythe decoding unit.
 42. The decoder according to claim 38, wherein theacquisition unit is configured to acquire information pertaining towhite balance performed on the white balance-adjusted RAW image data,and wherein an inverse adjustment unit is configured to use informationpertaining to the white balance in order to convert a color of the whitebalance-adjusted RAW image data back to a color prior to adjustment ofthe white balance.
 43. The decoder according to claim 38, wherein theacquisition unit is configured to acquire a plurality of encoded framesincluding the first encoded image data and the second encoded imagedata, wherein the decoding unit, for each of the encoded frames, isconfigured to decode the first encoded image data to the first imagedata, decode the second encoded image data to the second image data onthe basis of the first image data, and output a plurality of framesincluding the first image data and the second image data, and whereinthe generation unit is configured to generate, for each of the frames,the white balance-adjusted RAW image data in which the first colorcomponent and the second color component are arranged in a repeatingfashion, on the basis of the first image data and the second image datadecoded by the decoding unit.
 44. The decoder according to claim 43,wherein the acquisition unit is configured to acquire an encoded frameincluding second encoded image data attained by encoding the secondimage data on the basis of a difference between the second image dataand a prediction value for the second image data generated on the basisof the first image data.